The "Ghost in the Machine" Problem: How to Feed Your JS-Heavy Site to AI (Without Lifting a Finger)

Hydrated React and Next.js sites can look empty to LLM crawlers. Here’s how a mirror, DNS, and robots.txt can route bots to clean Markdown—without refactoring your frontend.

In the modern web, we’ve built a bit of a paradox. We use powerful frameworks like React and Next.js to create “hydrated” experiences—apps that feel fluid and alive. But to an AI crawler (like OpenAI’s GPTBot or Anthropic’s ClaudeBot), these sites often look like a haunted house: plenty of structure, but nobody’s home.

The content only “appears” once a browser executes complex JavaScript. Most LLM crawlers are either too lazy or too resource-constrained to wait for that hydration. They see a blank page, a loading spinner, and then they leave.

If your site is jsheavy.com, your brilliant content is effectively invisible to the models that power the world’s AI answers.

Enter AI Exorcist and their “Mirror” service. Here is how we fixed the hydration trap for jsheavy.com using the absolute bare minimum of effort: a single DNS record and a text file.

The strategy: the “laissez-faire” integration

Jsheavy.com didn’t want to rewrite their entire frontend architecture to support Server-Side Rendering (SSR). Instead, they let AI Exorcist do the heavy lifting.

AI Exorcist uses a headless browser (Playwright) to visit jsheavy.com, wait for the JS to hydrate, scrape the rendered content, and convert it into clean, AI-friendly Markdown. But the magic isn’t just in the rendering—it’s in the redirection and attribution.

Step 1: The identity claim (DNS & CNAME)

First, jsheavy.com created a CNAME record: ai.jsheavy.com pointing to jsheavy.aiexorcist.com.

Why this matters

By using a subdomain (ai.jsheavy.com), the content stays under the company’s “brand umbrella.” When an LLM cites the source, it sees jsheavy.com in the URL. To make this work, AI Exorcist handles the SSL certificate for that subdomain automatically via SNI (Server Name Indication), ensuring the bot sees a valid “ID card” when it connects.

Step 2: The secret map (robots.txt)

Jsheavy.com didn’t need to change a single line of application code. They just updated their robots.txt file to give AI bots a different set of instructions than Google or humans.

# Regular search engines see the main site
User-agent: Googlebot
Disallow: /private/
Sitemap: https://jsheavy.com/sitemap.xml

# LLM crawlers get the "VIP Markdown Lounge"
User-agent: GPTBot
User-agent: Claude-Web
Sitemap: https://ai.jsheavy.com/mirror-sitemap.xml

By pointing the AI-specific Sitemap to the mirror, the bots are funneled toward the readable Markdown version before they even try to struggle with the main site’s JavaScript.

Step 3: The crawler’s journey

Here is what happens when GPTBot arrives at your domain:

Discovery: The bot checks jsheavy.com/robots.txt and sees the custom Sitemap at ai.jsheavy.com.
Navigation: It heads over to the mirror. Because of the CNAME and SSL setup, the bot feels “safe”—it believes it is still talking to an official jsheavy.com server.
Consumption: It downloads a clean Markdown file. No JS, no hydration, no spinners. Just the data.
Attribution: This is the critical part. Even though the bot is on a mirror, AI Exorcist injects two “authority” signals:
- The technical signal: A <link rel="canonical" href="https://jsheavy.com/page"> tag in the HTML. This tells the bot’s database: “The owner of this knowledge is the main site.”
- The content signal: A header inside the Markdown text: Source: https://jsheavy.com. This ensures that when the LLM generates an answer, it physically sees the brand name in its “thinking” window.

The result: total visibility, zero refactoring

By delegating the “exorcism” of their JavaScript to a specialized service, jsheavy.com achieved:

100% indexing of their content in AI models.
Correct attribution so they get the traffic and the “expert” status.
Zero downtime or changes to their existing tech stack.

If your site is currently a “black box” to AI, you don’t need a new frontend team. You just need a better map for the bots to follow.

Quick technical recap for the devs

jsheavy.com: Update robots.txt and add a CNAME record. (Total time: ~5 minutes.)
AI Exorcist: Renders pages via Playwright, serves Markdown, manages SSL for the CNAME, and ensures canonical tags point back to the origin. (Total time: automated.)

Get your site graded Read the methodology