Free robots.txt
tester
Test your robots.txt against 14 crawlers — Googlebot, Bingbot, the AI bots (GPTBot, ClaudeBot, PerplexityBot) and 8 more — to find rules that accidentally block traffic. Most other testers only check Googlebot. We catch what they miss.
Free tool · No signup · No credit card · Results in 60 seconds
One typo can block your whole AI traffic channel
robots.txt is two decades old, deceptively simple, and surprisingly hard to get right. A single Disallow: / in the wrong block can silently de-rank you for months. A missing User-agent: GPTBot Allow: / declaration can cut you out of ChatGPT recommendations. And in the AI era, the bot list grows every month — most testers haven't been updated to include them. We have.
crawlers we test against — incl. all major AI agents most testers don't know about.
is all it takes to accidentally block traffic. Common offender: 'Disallow: /' inside the wrong User-agent block.
to fetch your robots.txt, parse it, simulate all 14 crawlers and check sitemap declaration.
What we test
Beyond syntax — we simulate each crawler's parsing logic to catch real-world misses.
robots.txt presence and HTTP status
Fetches yourdomain.com/robots.txt and checks for 200 OK + text/plain content-type. Many sites accidentally serve 200 OK on /robots.txt with HTML 404 pages — a silent fail that breaks every crawler.
highSyntax validation per Google's draft RFC 9309
Whitespace-tolerant parsing matching the official spec. Catches typos like 'Disalow:' and unrecognized directives that crawlers silently ignore.
mediumGooglebot crawl simulation
Walks rules in spec-compliant order: longest-match precedence, Allow beats Disallow at equal length, group inheritance for User-agent: *. Reports allowed/blocked status per common path.
highBingbot, DuckDuckBot, Yandex, Baidu
Major non-Google search engines, each with subtle parsing differences. Bingbot specifically does NOT follow Crawl-delay rules over 30s — your '300s' delay is being ignored.
mediumAI bots — GPTBot, ChatGPT-User
OpenAI uses two distinct bots: GPTBot (training crawl) and ChatGPT-User (live user lookups). Many sites block GPTBot but forget ChatGPT-User, causing 'cannot access this URL' messages in ChatGPT.
highAI bots — ClaudeBot, anthropic-ai, Claude-Web
Anthropic uses three bots for different functions. Disallow: ClaudeBot blocks Claude.ai; anthropic-ai is the older training crawler; Claude-Web handles live user browsing.
highAI bots — Google-Extended
Google's separate opt-out for AI training (Gemini). Different from Googlebot — blocking Googlebot also blocks Google-Extended, but allowing Googlebot doesn't auto-allow Google-Extended.
highPerplexityBot, Applebot, Applebot-Extended
Perplexity AI's crawler and Apple's two bots (Applebot for Siri, Applebot-Extended for AI training). Often missing from default robots.txt files.
mediumBytespider, CCBot, Diffbot, AmazonBot
Other commonly-encountered crawlers. Bytespider (TikTok/ByteDance) is aggressive — most sites end up rate-limiting it. CCBot is Common Crawl, used to train almost every LLM.
lowSitemap declaration
Robots.txt should reference your sitemap.xml location. We check the Sitemap: directive and verify the URL is reachable.
mediumCrawl-delay sanity check
Excessive Crawl-delay (>10) can starve crawl budget. Most modern crawlers either ignore it or cap at 30s. We flag values likely to cause problems.
lowInside the tester
We fetch your robots.txt with a normal User-Agent (Mozilla/5.0 AuditCore-Robots-Test/1.0) and verify content-type. Many sites accidentally serve their 404 page or login page on /robots.txt — looks like 200 OK to a casual checker but breaks every crawler.
We then parse the file using a Google-compatible parser (longest-match precedence, group inheritance for User-agent: *, Allow-beats-Disallow at equal length). We test 24 common URL paths (/, /sitemap.xml, /admin, /api, /search, /private/, /static/) against each of the 14 crawlers and produce an allow/block matrix.
For AI bots specifically we run an additional live test: we make an actual HTTP request as that User-Agent and verify your origin returns 200 OK rather than 403 / Cloudflare challenge / WAF block. robots.txt only describes intent — your CDN/WAF can override it. The full AuditCore audit pairs this with the AI-Readiness Scanner for the complete picture.
Frequently asked questions
Does robots.txt actually block crawlers? I've heard it's just a request.+
Robots.txt is voluntary — it asks crawlers not to visit. Major search engines (Google, Bing, Yandex, DuckDuckGo) and major AI bots (GPTBot, ClaudeBot, PerplexityBot) honor it. Malicious scrapers and some aggressive marketing crawlers ignore it. For real blocking, use HTTP-level WAF rules or authentication.
What's the difference between Disallow: / and Disallow: /*?+
Different parsers. The original RFC says they're equivalent (everything starts with /). Google's parser treats them identically. Some older crawlers parse Disallow: /* as 'paths containing slash followed by anything' — same outcome but obscure edge case. Use Disallow: / — it's clearer and universally supported.
Should I block AI bots?+
Depends on your business. If your value is 'be discoverable when users ask AI for recommendations' — SaaS, e-commerce, agencies, content sites — blocking AI bots in 2026 hurts you. If your value is your unique content and AI is scraping it without attribution (some publishers, paywalled content, original journalism) — blocking can make sense. Most sites should allow.
I have 'User-agent: * Disallow: /admin'. Why is it not blocking GPTBot from /admin?+
Trick question — it IS blocking GPTBot. User-agent: * applies to crawlers that don't have their own block. The trap is the OPPOSITE: if you have 'User-agent: GPTBot Allow: /', that block REPLACES the * block for GPTBot (per spec, no merging). So GPTBot can crawl /admin even though * is disallowed. Add an explicit GPTBot Disallow: /admin if needed.
How do I block ChatGPT but allow OpenAI search?+
OpenAI runs three bots: GPTBot (training), ChatGPT-User (live browsing for ChatGPT users), OAI-SearchBot (their search index). Block GPTBot if you don't want training; block ChatGPT-User if you don't want chat browsing; block OAI-SearchBot if you don't want appearing in search results. They're independent.
Why doesn't my Crawl-delay: 30 work?+
Bingbot caps at 30s. Yandex respects up to 60s. Googlebot ignores Crawl-delay entirely (use Google Search Console crawl rate setting). Most AI bots don't support it. For real rate limiting, configure your CDN or origin rate-limit by User-Agent.
Can robots.txt have comments?+
Yes — # at the start of a line. Useful for documenting why a path is blocked. We parse and ignore them; they don't affect bot behavior.
What happens if my robots.txt returns 503 or 5xx?+
Major crawlers treat 5xx as a temporary issue and retry. Sustained 5xx (multiple days) is treated as 'allow all' by some, 'disallow all' by others. Either way, broken robots.txt is bad. We test status codes and flag any non-200.
Related free tools
llms.txt Validator
Beyond robots.txt: validate the new llms.txt spec for AI agent context.
Try it freeFull AI-Readiness Checker
30+ checks across robots, schema, JS rendering, bot-vs-browser pricing.
Try it freeSecurity Headers Checker
Test 10 HTTP security headers — HSTS, CSP, X-Frame-Options.
Try it free