Free llms.txt
validator
Validates your /llms.txt and /llms-full.txt against the llmstxt.org spec, then checks the rest of your AI-agent surface — bot accessibility, structured data, JS-rendering. Be discoverable when users ask ChatGPT or Claude for recommendations.
Free tool · No signup · No credit card · Results in 60 seconds
What is llms.txt and why does it matter
llms.txt is to AI agents what robots.txt + sitemap.xml + meta description used to be to search engines. Proposed by Jeremy Howard in 2024 and rapidly adopted by AI-first companies through 2025-2026, it gives LLMs a structured, condensed map of your site so they can answer user questions about your content accurately. Without llms.txt, AI agents either crawl HTML (slow, lossy, JS-blocked) or skip your site entirely. Search results may move from Google to ChatGPT, Claude and Perplexity — and llms.txt is rapidly becoming the canonical way to be visible there.
of the top 1M websites had an llms.txt as of late 2025 — first-mover advantage is wide open.
GPTBot, ClaudeBot, Google-Extended, PerplexityBot, anthropic-ai, CCBot, Applebot-Extended, Bytespider — all need access.
to validate your llms.txt structure, test all 8 AI bot user-agents and audit Schema.org coverage.
What our llms.txt validator tests
Beyond just spec validation — we audit your full AI-agent discoverability surface.
/llms.txt presence and HTTP status
We fetch yourdomain.com/llms.txt and validate it returns 200 OK, content-type text/plain or text/markdown, and isn't behind auth or a CDN challenge.
highSpec compliance — H1, blockquote intro, sections
Per llmstxt.org: file must start with # Title, followed by > Brief overview blockquote, then ## Section H2s with link lists. We parse the markdown structure and flag deviations.
highSection structure — Required vs Optional
Spec recommends key sections (Documentation, Examples) with Optional section for less-critical links. We validate sections are properly named and the link lists are well-formed markdown.
mediumLink reachability and link descriptions
Every link in your llms.txt should resolve to a real URL with a meaningful description. We HEAD-check every link and flag 404s, redirects, and missing/empty descriptions.
medium/llms-full.txt presence (extended spec)
Optional but increasingly common: a /llms-full.txt with denser content for agents that need deeper context. We check it exists, is well-structured and isn't a duplicate of /llms.txt.
lowAI bot user-agent accessibility (live HTTP)
We make 8 actual HTTP requests as GPTBot, ClaudeBot, Google-Extended, PerplexityBot, anthropic-ai, CCBot, Applebot-Extended and Bytespider. 200 OK = pass. 403, Cloudflare challenge or empty body = fail.
criticalrobots.txt rules for AI bots
Many sites block AI bots in robots.txt accidentally — sometimes via a single overly broad Disallow: /. We parse your robots.txt and check explicit rules for each AI bot.
highSchema.org structured data coverage
AI agents rely on JSON-LD to understand context. We check for Organization + WebSite minimum, plus Product + Offer (if e-com), Article + author (if blog) or SoftwareApplication (if SaaS).
highJavaScript-only rendering detection
Most AI agents have weak or no JS support. If your visible text comes from JS hydration, AI sees blank pages. We measure visible text in raw HTML vs after-render HTML.
highOpenGraph completeness
AI link-preview generation falls back to OpenGraph when llms.txt is absent. og:title, og:description, og:image, og:url, og:type — we check all five.
mediumBehind the scenes
We fetch yourdomain.com/llms.txt with a normal User-Agent and parse it as markdown using a CommonMark parser. The parser walks the AST and validates that the document matches the structure proposed at llmstxt.org: H1 title, blockquote summary, H2 sections each containing a markdown bulleted list where each bullet is a link with a description.
Then we make 8 separate HTTPS requests to your homepage with each AI agent's exact User-Agent string (e.g. 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot'). We record status code, response time, body size and any redirect chain. Cloudflare's anti-bot Managed Challenge is a common failure mode — it returns 403 with an HTML challenge page that AI bots can't solve.
For Schema.org coverage we render the page (with our own headless engine that supports modern Next.js / Nuxt / Remix) and parse JSON-LD blocks. We don't just check 'has @type Organization' — we validate required properties (name, url, logo for Organization; offers for Product; etc).
The full AuditCore AI-Readiness audit goes 30+ checks deep with app-type detection (e-commerce vs blog vs SaaS gets different test profiles) and the unique bot-vs-browser pricing diff for online stores. The free validator covers the 10 most common reasons sites fail AI-agent visibility.
Frequently asked questions
What's the difference between llms.txt and llms-full.txt?+
llms.txt is the navigation file: a brief overview + sectioned links to key pages. llms-full.txt is the content file: longer-form text intended to give an LLM enough context to answer detailed questions about your product without crawling further. Both are optional; large sites typically ship both, smaller sites ship only llms.txt.
Is llms.txt actually used? Or is it just hype?+
Adoption is growing fast. Major LLM providers including Anthropic, OpenAI and Perplexity have publicly stated they parse llms.txt when present. Tools like Cursor, Continue.dev and Claude Code use it to ground their answers. Adoption is still early (<3% of top sites) but trajectory is clear — and being early on a new standard pays off in SEO every time.
Will llms.txt replace robots.txt + sitemap.xml?+
No, they serve different purposes. robots.txt controls crawl access, sitemap.xml lists URLs for indexing, llms.txt provides STRUCTURED CONTEXT for LLMs. You should have all three. Crawlers (Googlebot, Bingbot) keep using robots/sitemap; AI agents (GPTBot, ClaudeBot) increasingly prefer llms.txt for context.
What if I don't want AI bots crawling my site?+
Block them in robots.txt: 'User-agent: GPTBot' Disallow: /' (and same for the other 7 bots). But if your business depends on discoverability — SaaS, e-commerce, content sites, agencies — blocking AI bots in 2026 is roughly equivalent to blocking Google in 2010. Customers ask AI for recommendations now.
I'm using Cloudflare. Is that blocking AI bots?+
Possibly. Cloudflare's 'Bot Fight Mode' and the newer 'Block AI Bots' toggle are both default-on for some plans. Check Dashboard > Security > Bots. Our checker explicitly tests live access as each AI agent so you find this problem in 60 seconds rather than wondering why your traffic disappeared from ChatGPT.
What should be in my llms.txt?+
Per the spec: a H1 with your site name, a blockquote with a 1-3 sentence summary, then sections of bulleted links. Common sections: 'Documentation', 'Examples', 'API Reference', 'Optional'. The optional section is for less-critical pages. Keep total file <2KB if possible — LLMs work with limited context.
Where do I host llms.txt?+
Always at the root: yourdomain.com/llms.txt. The spec doesn't allow alternate locations. If you have multiple subdomains, host one per subdomain. SPAs and static sites: just drop the file in your /public or /static folder.
Does Schema.org actually matter for AI agents?+
Yes, and increasingly so. While LLMs can extract some context from natural-language HTML, structured JSON-LD gives them deterministic answers about your business — your name, location, products, prices, opening hours. Sites with good Schema.org get cited correctly when AI summarizes; sites without get vague or wrong answers.
Related free tools
Full AI-Readiness Checker
30+ checks across robots, schema, JS rendering, bot-vs-browser pricing.
Try it freerobots.txt Tester
Test 14 crawlers (Googlebot + AI bots) against your robots.txt.
Try it freeSecurity Headers Checker
10 HTTP security headers — bot accessibility starts with reachable endpoints.
Try it free