AI crawlers have become a common presence across the web, scanning websites and collecting large volumes of content. While some serve legitimate purposes, others operate in ways that are overly aggressive, consuming resources and capturing information that site owners may want to keep private. To maintain control and protect valuable content, it is important to have effective measures in place for managing and blocking unwanted AI crawler access.
Launch provides two ways to help you control access by AI crawlers:
The robots.txt file provides crawl instructions for compliant bots. You can use it to disallow specific User-Agent strings from accessing certain parts of your site.
Here’s a sample robots.txt to block common AI crawlers:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: ai-crawler
Disallow: /
User-agent: *
Disallow: /private-directory/Note: Some bots may ignore the robots.txt file and continue crawling your site. To strictly block AI crawlers, you can enforce the restriction at the edge using Launch Edge Functions.
As bots can ignore the robots.txt file, you can use Launch Edge Functions to detect and block suspicious User-Agent strings in real time.
Example Launch Edge Function
const KNOWN_BOTS = [
'claudebot',
'gptbot',
'googlebot',
'bingbot',
'ahrefsbot',
'yandexbot',
'semrushbot',
'mj12bot',
'facebookexternalhit',
'twitterbot',
//more bots can be added here
];
export default function handler(request) {
const userAgent = (request.headers.get('user-agent') || '').toLowerCase();
const isBot = KNOWN_BOTS.some(bot => userAgent.includes(bot));
if (isBot) {
return new Response('Forbidden: AI crawlers are not allowed.', { status: 403 });
}
return fetch(request);
}While the robots.txt file helps communicate your site's crawling preferences, runtime protections like Launch Edge Functions offer more reliable control, especially in an AI-driven environment where bot behavior is increasingly unpredictable.