whitelist the good bots, block the costly ai scrapers
Tick the crawlers you want, switch off the AI scrapers you don't, add your own rules, and download a clean, standards-compliant robots.txt. Everything below is live — no account needed.
configthis device
configs are saved in this browser. go pro to sync them to your account across every device.
whitelist good bots
search, social, and archive crawlers worth keeping.
GooglebotGooglebot · google search
BingbotBingbot · bing search
DuckDuckBotDuckDuckBot · duckduckgo search
ApplebotApplebot · siri & spotlight
Internet Archiveia_archiver · wayback machine
Facebookfacebookexternalhit · link previews
TwitterbotTwitterbot · link previews
LinkedInBotLinkedInBot · link previews
SlackbotSlackbot · link previews
PinterestbotPinterestbot · rich pins
block ai scrapers
a curated list of training and answer-engine crawlers.
GPTBotGPTBot · model training
OAI-SearchBotOAI-SearchBot · chatgpt search
ClaudeBotClaudeBot · model training
anthropic-aianthropic-ai · model training
CCBotCCBot · training dataset
Google-ExtendedGoogle-Extended · gemini training
Applebot-ExtendedApplebot-Extended · apple intelligence
Meta-ExternalAgentMeta-ExternalAgent · model training
PerplexityBotPerplexityBot · answer engine
BytespiderBytespider · model training
AmazonbotAmazonbot · alexa & training
cohere-aicohere-ai · model training
DiffbotDiffbot · knowledge graph
ImagesiftBotImagesiftBot · image dataset
YouBotYouBot · answer engine
TimpibotTimpibot · search dataset
custom rules
no custom rules. add one to allow or disallow a specific path (e.g. disallow /admin).
# robots.txt — generated by robot.guard # robotguard.ogbuilds.ai # allowed crawlers User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: DuckDuckBot Allow: / User-agent: Applebot Allow: / User-agent: ia_archiver Allow: / User-agent: facebookexternalhit Allow: / User-agent: Twitterbot Allow: / User-agent: LinkedInBot Allow: / # blocked ai scrapers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: Diffbot Disallow: / User-agent: ImagesiftBot Disallow: /
download the file and place it at your site root (yoursite.com/robots.txt). robots.txt is a request compliant crawlers honour — pair it with a firewall for bots that ignore it.