AI-Readable HTML: What the Scrapers Actually See

A look at the gap between what humans see on a portfolio site and what an AI fetch returns. With receipts.

Apr 7, 2026 · portfolio, ai, seo

The first time I ran curl https://my-old-site and piped it through htmlq -t, I felt sick. The site I’d shipped looked great in Chrome. In the curl output, it was a <div id="root"> and nothing else.

That’s the gap. Humans see one website. AI scrapers — and the LLMs they feed — see a different one. If you’re a developer in 2026 and you care about being findable, surfaced, or summarized accurately, you should know which version is yours.

The test

curl -A "Mozilla/5.0" https://yoursite.com/cv | grep -o '<h2[^>]*>[^<]*</h2>'

If you get nothing back, your résumé is invisible to anything that doesn’t run JavaScript. That category includes most large-model crawlers, RSS readers, archive.org, and a surprising number of corporate proxies.

What changes if you fix it

Six months after rebuilding with static HTML, I started getting referral traffic from sources I’d never seen before — chat.openai.com, you.com, perplexity.ai. The phrasing in the referral previews matched my actual page copy. People were asking those tools who I was, and the tools were citing me, accurately, instead of guessing.

That’s the difference between being legible and being invisible.

What good looks like

Static HTML on first paint. Anything important shouldn’t need JS.
Semantic elements: <article>, <section>, <time>, <nav>.
JSON-LD on every page. Person + WebSite site-wide. BlogPosting and CreativeWork where they apply.
Tech stacks as text, not logo images. Names as text, not as a hero image.
A /cv page that’s structured HTML, not just a PDF link.

This site is the reference implementation. View source on any page. The first paragraph of every article is in the HTML, not rendered. Every project tag is a <li>, not an <img>.