AI-Readable HTML: What the Scrapers Actually See
A look at the gap between what humans see on a portfolio site and what an AI fetch returns. With receipts.
· portfolio, ai, seo
The first time I ran curl https://my-old-site and piped it through htmlq -t, I felt sick.
The site I’d shipped looked great in Chrome. In the curl output, it was a <div id="root">
and nothing else.
That’s the gap. Humans see one website. AI scrapers — and the LLMs they feed — see a different one. If you’re a developer in 2026 and you care about being findable, surfaced, or summarized accurately, you should know which version is yours.
The test
curl -A "Mozilla/5.0" https://yoursite.com/cv | grep -o '<h2[^>]*>[^<]*</h2>'
If you get nothing back, your résumé is invisible to anything that doesn’t run JavaScript. That category includes most large-model crawlers, RSS readers, archive.org, and a surprising number of corporate proxies.
What changes if you fix it
Six months after rebuilding with static HTML, I started getting referral traffic from
sources I’d never seen before — chat.openai.com, you.com, perplexity.ai. The
phrasing in the referral previews matched my actual page copy. People were asking those
tools who I was, and the tools were citing me, accurately, instead of guessing.
That’s the difference between being legible and being invisible.
What good looks like
- Static HTML on first paint. Anything important shouldn’t need JS.
- Semantic elements:
<article>,<section>,<time>,<nav>. - JSON-LD on every page.
Person+WebSitesite-wide.BlogPostingandCreativeWorkwhere they apply. - Tech stacks as text, not logo images. Names as text, not as a hero image.
- A
/cvpage that’s structured HTML, not just a PDF link.
This site is the reference implementation. View source on any page. The first paragraph
of every article is in the HTML, not rendered. Every project tag is a <li>, not an <img>.