How to Block AI Bots on AWS: Complete 2026 Guide
AWS is an infrastructure platform — your bot blocking strategy depends on which AWS services you use. This guide covers the most common stacks: CloudFront + S3 (static sites), CloudFront + EC2/ECS (server-rendered apps), and API Gateway + Lambda (serverless APIs).
Choose your blocking layer
CloudFront Response Headers PolicyX-Robots-Tag — no code, console configCloudFront FunctionsHard 403 — lightweight JS, sub-ms, cheapestLambda@Edge (viewer request)Hard 403 — full Node.js, async calls, more powerfulAWS WAF Bot ControlManaged bot rules — no code, updated automaticallyLambda authorizer (API Gateway)Hard 403 for REST APIs without CloudFrontLayer 1: robots.txt
For static sites on S3 + CloudFront, upload robots.txt as an S3 object with Content-Type: text/plain. CloudFront caches and serves it from edge locations globally.
S3 upload via AWS CLI
# Upload robots.txt to S3 with correct content type aws s3 cp robots.txt s3://your-bucket/robots.txt \ --content-type "text/plain" \ --cache-control "public, max-age=86400" # robots.txt content: cat <<'EOF' > robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider User-agent: Applebot-Extended User-agent: PerplexityBot User-agent: Diffbot User-agent: cohere-ai User-agent: FacebookBot User-agent: Amazonbot User-agent: omgili Disallow: / EOF
For server-rendered apps (EC2/ECS/Lambda)
Serve robots.txt from your application as a static route. Example for Express.js on EC2 or Lambda (via Lambda Function URL or API Gateway):
// Express.js — serve robots.txt before any middleware
app.get('/robots.txt', (req, res) => {
res.type('text/plain');
res.send([
'User-agent: *',
'Allow: /',
'',
'User-agent: GPTBot',
'User-agent: ClaudeBot',
'User-agent: anthropic-ai',
'User-agent: Google-Extended',
'User-agent: CCBot',
'User-agent: Bytespider',
'User-agent: Applebot-Extended',
'User-agent: PerplexityBot',
'User-agent: cohere-ai',
'User-agent: FacebookBot',
'User-agent: Amazonbot',
'Disallow: /',
].join('\n'));
});Layer 2: noai meta tag
The noai and noimageai meta directives are set in your application's HTML, not in AWS infrastructure. Use your framework's meta API (Next.js Metadata, Express res.render with a template variable, etc.). AWS has no native mechanism to inject HTML meta tags — this is handled at the application layer.
If your app runs on Lambda (serverless SSR)
// Lambda handler — inject meta into HTML response
export const handler = async (event) => {
const html = renderPage(event); // your SSR function
// Inject noai meta before </head>
const patched = html.replace(
'</head>',
'<meta name="robots" content="noai, noimageai"></head>'
);
return {
statusCode: 200,
headers: {
'Content-Type': 'text/html',
'X-Robots-Tag': 'noai, noimageai',
},
body: patched,
};
};Layer 3: X-Robots-Tag via CloudFront Response Headers Policy
CloudFront Response Headers Policies add headers to every response at the CDN edge before it reaches the client. No code, no Lambda, no origin changes — configured entirely in the AWS console or via IaC.
AWS Console setup
- Go to CloudFront → Policies → Response headers
- Click Create response headers policy
- Under Custom headers → Add header:
- Header name:
X-Robots-Tag - Value:
noai, noimageai - Override origin: Yes
- Save and associate the policy with your distribution's cache behaviors
Terraform / AWS CloudFormation
# Terraform — CloudFront Response Headers Policy
resource "aws_cloudfront_response_headers_policy" "ai_bot_headers" {
name = "ai-bot-noai-headers"
custom_headers_config {
items {
header = "X-Robots-Tag"
value = "noai, noimageai"
override = true
}
}
}
resource "aws_cloudfront_distribution" "site" {
# ... your existing config ...
default_cache_behavior {
# ... your existing behavior ...
response_headers_policy_id = aws_cloudfront_response_headers_policy.ai_bot_headers.id
}
}Layer 4: Hard 403 blocking
Two options for hard blocking at CloudFront: CloudFront Functions (lightweight, sub-millisecond, cheapest) and Lambda@Edge (full Node.js runtime, more powerful but higher latency and cost). For simple UA string matching, use CloudFront Functions.
Option A: CloudFront Functions (recommended)
CloudFront Functions use a lightweight JavaScript runtime — no npm, no Node.js built-ins, but string operations work fine for UA matching. Deploy in us-east-1 and associate as a viewer-request trigger.
// CloudFront Function — viewer-request trigger
// Runtime: cloudfront-js-2.0
// Deploy in us-east-1 regardless of distribution region
var AI_BOTS = [
'gptbot', 'chatgpt-user', 'oai-searchbot',
'claudebot', 'anthropic-ai', 'claude-web',
'google-extended', 'ccbot', 'bytespider',
'applebot-extended', 'perplexitybot', 'diffbot',
'cohere-ai', 'facebookbot', 'amazonbot',
'omgili', 'omgilibot', 'iaskspider', 'youbot',
];
var EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];
function handler(event) {
var request = event.request;
var uri = request.uri;
// Allow exempt paths — bots need robots.txt
for (var i = 0; i < EXEMPT_PATHS.length; i++) {
if (uri.startsWith(EXEMPT_PATHS[i])) {
return request;
}
}
// Get User-Agent header (CloudFront Functions use lowercase header names)
var ua = '';
if (request.headers['user-agent']) {
ua = request.headers['user-agent'].value.toLowerCase();
}
// Check for known AI bots
for (var j = 0; j < AI_BOTS.length; j++) {
if (ua.indexOf(AI_BOTS[j]) !== -1) {
return {
statusCode: 403,
statusDescription: 'Forbidden',
headers: {
'content-type': { value: 'text/plain' },
},
body: 'Forbidden',
};
}
}
return request;
}CloudFront User-Agent header note
CloudFront normalizes User-Agent by default. To read the full original UA string in CloudFront Functions, you must configure your cache policy to include User-Agent in the cache key (under "Headers — Include in cache key"). Without this, CloudFront may pass a simplified UA or none at all to the function.
Deploy via AWS CLI
# 1. Create the CloudFront Function aws cloudfront create-function \ --name block-ai-bots \ --function-config Comment="Block AI training crawlers",Runtime=cloudfront-js-2.0 \ --function-code fileb://block-ai-bots.js \ --region us-east-1 # 2. Publish it aws cloudfront publish-function \ --name block-ai-bots \ --if-match $(aws cloudfront describe-function --name block-ai-bots --query 'ETag' --output text) \ --region us-east-1 # 3. Associate with your distribution (update your distribution config) # Add to DefaultCacheBehavior or CacheBehaviors: # FunctionAssociations: # Items: # - EventType: viewer-request # FunctionARN: arn:aws:cloudfront::ACCOUNT:function/block-ai-bots
Option B: Lambda@Edge (Node.js — more powerful)
Use Lambda@Edge when you need async calls (external blocklist lookup, DynamoDB query, etc.) or richer logic. Must be deployed in us-east-1.
// Lambda@Edge — viewer request trigger
// Runtime: nodejs20.x
// Deploy in us-east-1
const AI_BOTS = [
'gptbot', 'chatgpt-user', 'oai-searchbot',
'claudebot', 'anthropic-ai', 'claude-web',
'google-extended', 'ccbot', 'bytespider',
'applebot-extended', 'perplexitybot', 'diffbot',
'cohere-ai', 'facebookbot', 'amazonbot',
'omgili', 'omgilibot', 'iaskspider', 'youbot',
];
const EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const uri = request.uri;
// Exempt paths
if (EXEMPT_PATHS.some(p => uri.startsWith(p))) {
return request;
}
// Get User-Agent (Lambda@Edge receives original headers)
const uaHeader = request.headers['user-agent'];
const ua = uaHeader ? uaHeader[0].value.toLowerCase() : '';
const isAIBot = AI_BOTS.some(bot => ua.includes(bot));
if (isAIBot) {
return {
status: '403',
statusDescription: 'Forbidden',
headers: {
'content-type': [{ key: 'Content-Type', value: 'text/plain' }],
},
body: 'Forbidden',
};
}
return request;
};AWS WAF Bot Control (managed rules)
AWS WAF Bot Control is a managed rule group that uses AWS-maintained signatures — it updates automatically without code changes. It also uses behavioral signals beyond User-Agent (IP reputation, request patterns) making it harder to evade.
Cost: ~$10/month base + $1/million requests. Overkill for most sites but worth it at scale.
Terraform — WAF with Bot Control
# Terraform — WAF Web ACL with Bot Control
resource "aws_wafv2_web_acl" "main" {
name = "ai-bot-block"
scope = "CLOUDFRONT" # Must be us-east-1 for CloudFront
default_action {
allow {}
}
# AWS managed Bot Control rule group
rule {
name = "BotControl"
priority = 1
statement {
managed_rule_group_statement {
vendor_name = "AWS"
name = "AWSManagedRulesBotControlRuleSet"
# Optional: enable targeted inspection (more accurate, higher cost)
managed_rule_group_configs {
aws_managed_rules_bot_control_rule_set {
inspection_level = "COMMON" # or "TARGETED"
}
}
}
}
override_action {
count {} # Use count{} first to audit, then switch to none{} to block
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "BotControl"
sampled_requests_enabled = true
}
}
visibility_config {
cloudwatch_metrics_enabled = true
metric_name = "WebACL"
sampled_requests_enabled = true
}
}
# Associate with CloudFront distribution
resource "aws_cloudfront_distribution" "site" {
# ...
web_acl_id = aws_wafv2_web_acl.main.arn
}API Gateway — blocking without CloudFront
For Lambda-backed REST APIs on API Gateway without CloudFront, check the User-Agent at the start of your Lambda handler or use a Lambda authorizer.
Lambda handler early return (simplest)
// Lambda function handler — check UA before processing
const AI_BOTS = ['gptbot', 'claudebot', 'anthropic-ai', 'google-extended',
'ccbot', 'bytespider', 'applebot-extended', 'perplexitybot',
'diffbot', 'cohere-ai', 'facebookbot', 'amazonbot', 'omgili'];
exports.handler = async (event) => {
const ua = (event.headers?.['User-Agent'] ?? '').toLowerCase();
const isAIBot = AI_BOTS.some(bot => ua.includes(bot));
if (isAIBot) {
return {
statusCode: 403,
headers: { 'Content-Type': 'text/plain' },
body: 'Forbidden',
};
}
// ... your actual handler logic
};CloudFront Functions vs Lambda@Edge vs WAF
| Feature | CF Functions | Lambda@Edge | WAF Bot Control |
|---|---|---|---|
| Runtime | Lightweight JS | Node.js/Python/Go | Managed (no code) |
| Latency | <1ms | 1-5ms | <1ms |
| Cost | $0.10/M req | $0.60/M req | $10/mo + $1/M |
| UA string access | Requires cache policy config | Full original headers | Built-in |
| Auto-updates | Manual | Manual | Yes (AWS-maintained) |
| Async calls | No | Yes | Yes |
| Best for | Simple bot blocking | Complex logic | No-code, scale |
FAQ
What is the difference between Lambda@Edge and CloudFront Functions for bot blocking?
CloudFront Functions run at every edge location with sub-millisecond latency and cost ~$0.10/million requests. They use a lightweight JS runtime — sufficient for UA string matching. Lambda@Edge runs full Node.js at regional caches, costs more, but supports async calls. For simple UA matching, use CloudFront Functions.
Should I use CloudFront Response Headers Policy or Lambda@Edge to add X-Robots-Tag?
Use CloudFront Response Headers Policy — no code, console configuration, applies at CDN edge. Lambda@Edge for header injection is overkill unless you need conditional logic (different headers per path).
Does CloudFront Functions have access to the full User-Agent header?
CloudFront normalizes User-Agent by default. To read the full original UA, configure your cache policy to include User-Agent in the cache key. Without this, CloudFront may pass a simplified UA.
How is AWS WAF Bot Control different from a Lambda@Edge bot block?
WAF Bot Control uses AWS-maintained signatures that update automatically. It also uses browser fingerprinting and IP reputation beyond UA matching. Lambda@Edge with a hardcoded list requires manual updates. WAF costs ~$10/month base. For most sites, Lambda@Edge with a good UA list is sufficient and cheaper.
Where do I deploy a Lambda@Edge function for CloudFront?
Lambda@Edge functions must be deployed in us-east-1 regardless of your distribution's primary region. CloudFront replicates them globally. Associate as a viewer request trigger — it runs before CloudFront checks cache or contacts the origin.
Can I block AI bots at API Gateway without CloudFront?
Yes. Use a Lambda authorizer (REQUEST type) that checks User-Agent and returns a Deny policy for known AI bots. Or check UA at the start of your Lambda handler and return 403 early.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.