AI-Readable HTML: What the Scrapers Actually See

A look at the gap between what humans see on a portfolio site and what an AI fetch returns. With receipts.

· portfolio, ai, seo

The first time I ran curl https://my-old-site and piped it through htmlq -t, I felt sick. The site I’d shipped looked great in Chrome. In the curl output, it was a <div id="root"> and nothing else.

That’s the gap. Humans see one website. AI scrapers — and the LLMs they feed — see a different one. If you’re a developer in 2026 and you care about being findable, surfaced, or summarized accurately, you should know which version is yours.

The test

curl -A "Mozilla/5.0" https://yoursite.com/cv | grep -o '<h2[^>]*>[^<]*</h2>'

If you get nothing back, your résumé is invisible to anything that doesn’t run JavaScript. That category includes most large-model crawlers, RSS readers, archive.org, and a surprising number of corporate proxies.

What changes if you fix it

Six months after rebuilding with static HTML, I started getting referral traffic from sources I’d never seen before — chat.openai.com, you.com, perplexity.ai. The phrasing in the referral previews matched my actual page copy. People were asking those tools who I was, and the tools were citing me, accurately, instead of guessing.

That’s the difference between being legible and being invisible.

What good looks like

  • Static HTML on first paint. Anything important shouldn’t need JS.
  • Semantic elements: <article>, <section>, <time>, <nav>.
  • JSON-LD on every page. Person + WebSite site-wide. BlogPosting and CreativeWork where they apply.
  • Tech stacks as text, not logo images. Names as text, not as a hero image.
  • A /cv page that’s structured HTML, not just a PDF link.

This site is the reference implementation. View source on any page. The first paragraph of every article is in the HTML, not rendered. Every project tag is a <li>, not an <img>.