Question 1

Does robots.txt actually block crawlers? I've heard it's just a request.

Accepted Answer

Robots.txt is voluntary — it asks crawlers not to visit. Major search engines (Google, Bing, Yandex, DuckDuckGo) and major AI bots (GPTBot, ClaudeBot, PerplexityBot) honor it. Malicious scrapers and some aggressive marketing crawlers ignore it. For real blocking, use HTTP-level WAF rules or authentication.

Question 2

What's the difference between Disallow: / and Disallow: /*?

Accepted Answer

Different parsers. The original RFC says they're equivalent (everything starts with /). Google's parser treats them identically. Some older crawlers parse Disallow: /* as 'paths containing slash followed by anything' — same outcome but obscure edge case. Use Disallow: / — it's clearer and universally supported.

Question 3

Should I block AI bots?

Accepted Answer

Depends on your business. If your value is 'be discoverable when users ask AI for recommendations' — SaaS, e-commerce, agencies, content sites — blocking AI bots in 2026 hurts you. If your value is your unique content and AI is scraping it without attribution (some publishers, paywalled content, original journalism) — blocking can make sense. Most sites should allow.

Question 4

I have 'User-agent: * Disallow: /admin'. Why is it not blocking GPTBot from /admin?

Accepted Answer

Trick question — it IS blocking GPTBot. User-agent: * applies to crawlers that don't have their own block. The trap is the OPPOSITE: if you have 'User-agent: GPTBot Allow: /', that block REPLACES the * block for GPTBot (per spec, no merging). So GPTBot can crawl /admin even though * is disallowed. Add an explicit GPTBot Disallow: /admin if needed.

Question 5

How do I block ChatGPT but allow OpenAI search?

Accepted Answer

OpenAI runs three bots: GPTBot (training), ChatGPT-User (live browsing for ChatGPT users), OAI-SearchBot (their search index). Block GPTBot if you don't want training; block ChatGPT-User if you don't want chat browsing; block OAI-SearchBot if you don't want appearing in search results. They're independent.

Question 6

Why doesn't my Crawl-delay: 30 work?

Accepted Answer

Bingbot caps at 30s. Yandex respects up to 60s. Googlebot ignores Crawl-delay entirely (use Google Search Console crawl rate setting). Most AI bots don't support it. For real rate limiting, configure your CDN or origin rate-limit by User-Agent.

Question 7

Can robots.txt have comments?

Accepted Answer

Yes — # at the start of a line. Useful for documenting why a path is blocked. We parse and ignore them; they don't affect bot behavior.

Question 8

What happens if my robots.txt returns 503 or 5xx?

Accepted Answer

Major crawlers treat 5xx as a temporary issue and retry. Sustained 5xx (multiple days) is treated as 'allow all' by some, 'disallow all' by others. Either way, broken robots.txt is bad. We test status codes and flag any non-200.

Free robots.txt
tester

One typo can block your whole AI traffic channel

What we test

robots.txt presence and HTTP status

Syntax validation per Google's draft RFC 9309

Googlebot crawl simulation

Bingbot, DuckDuckBot, Yandex, Baidu

AI bots — GPTBot, ChatGPT-User

AI bots — ClaudeBot, anthropic-ai, Claude-Web

AI bots — Google-Extended

PerplexityBot, Applebot, Applebot-Extended

Bytespider, CCBot, Diffbot, AmazonBot

Sitemap declaration

Crawl-delay sanity check

Inside the tester

Frequently asked questions

Related free tools

llms.txt Validator

Full AI-Readiness Checker

Security Headers Checker

Run a complete audit, not just one check

Free robots.txttester