How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Guides/AWS

How to Block AI Bots on AWS: Complete 2026 Guide

AWS is an infrastructure platform — your bot blocking strategy depends on which AWS services you use. This guide covers the most common stacks: CloudFront + S3 (static sites), CloudFront + EC2/ECS (server-rendered apps), and API Gateway + Lambda (serverless APIs).

Choose your blocking layer

CloudFront Response Headers PolicyX-Robots-Tag — no code, console config

CloudFront FunctionsHard 403 — lightweight JS, sub-ms, cheapest

Lambda@Edge (viewer request)Hard 403 — full Node.js, async calls, more powerful

AWS WAF Bot ControlManaged bot rules — no code, updated automatically

Lambda authorizer (API Gateway)Hard 403 for REST APIs without CloudFront

Layer 1: robots.txt

For static sites on S3 + CloudFront, upload robots.txt as an S3 object with Content-Type: text/plain. CloudFront caches and serves it from edge locations globally.

S3 upload via AWS CLI

# Upload robots.txt to S3 with correct content type
aws s3 cp robots.txt s3://your-bucket/robots.txt \
  --content-type "text/plain" \
  --cache-control "public, max-age=86400"

# robots.txt content:
cat <<'EOF' > robots.txt
User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Applebot-Extended
User-agent: PerplexityBot
User-agent: Diffbot
User-agent: cohere-ai
User-agent: FacebookBot
User-agent: Amazonbot
User-agent: omgili
Disallow: /
EOF

For server-rendered apps (EC2/ECS/Lambda)

Serve robots.txt from your application as a static route. Example for Express.js on EC2 or Lambda (via Lambda Function URL or API Gateway):

// Express.js — serve robots.txt before any middleware
app.get('/robots.txt', (req, res) => {
  res.type('text/plain');
  res.send([
    'User-agent: *',
    'Allow: /',
    '',
    'User-agent: GPTBot',
    'User-agent: ClaudeBot',
    'User-agent: anthropic-ai',
    'User-agent: Google-Extended',
    'User-agent: CCBot',
    'User-agent: Bytespider',
    'User-agent: Applebot-Extended',
    'User-agent: PerplexityBot',
    'User-agent: cohere-ai',
    'User-agent: FacebookBot',
    'User-agent: Amazonbot',
    'Disallow: /',
  ].join('\n'));
});

Layer 2: noai meta tag

The noai and noimageai meta directives are set in your application's HTML, not in AWS infrastructure. Use your framework's meta API (Next.js Metadata, Express res.render with a template variable, etc.). AWS has no native mechanism to inject HTML meta tags — this is handled at the application layer.

If your app runs on Lambda (serverless SSR)

// Lambda handler — inject meta into HTML response
export const handler = async (event) => {
  const html = renderPage(event);  // your SSR function

  // Inject noai meta before </head>
  const patched = html.replace(
    '</head>',
    '<meta name="robots" content="noai, noimageai"></head>'
  );

  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'text/html',
      'X-Robots-Tag': 'noai, noimageai',
    },
    body: patched,
  };
};

Layer 3: X-Robots-Tag via CloudFront Response Headers Policy

CloudFront Response Headers Policies add headers to every response at the CDN edge before it reaches the client. No code, no Lambda, no origin changes — configured entirely in the AWS console or via IaC.

AWS Console setup

Go to CloudFront → Policies → Response headers
Click Create response headers policy
Under Custom headers → Add header:
Header name: X-Robots-Tag
Value: noai, noimageai
Override origin: Yes
Save and associate the policy with your distribution's cache behaviors

Terraform / AWS CloudFormation

# Terraform — CloudFront Response Headers Policy
resource "aws_cloudfront_response_headers_policy" "ai_bot_headers" {
  name = "ai-bot-noai-headers"

  custom_headers_config {
    items {
      header   = "X-Robots-Tag"
      value    = "noai, noimageai"
      override = true
    }
  }
}

resource "aws_cloudfront_distribution" "site" {
  # ... your existing config ...

  default_cache_behavior {
    # ... your existing behavior ...
    response_headers_policy_id = aws_cloudfront_response_headers_policy.ai_bot_headers.id
  }
}

Layer 4: Hard 403 blocking

Two options for hard blocking at CloudFront: CloudFront Functions (lightweight, sub-millisecond, cheapest) and Lambda@Edge (full Node.js runtime, more powerful but higher latency and cost). For simple UA string matching, use CloudFront Functions.

Option A: CloudFront Functions (recommended)

CloudFront Functions use a lightweight JavaScript runtime — no npm, no Node.js built-ins, but string operations work fine for UA matching. Deploy in us-east-1 and associate as a viewer-request trigger.

// CloudFront Function — viewer-request trigger
// Runtime: cloudfront-js-2.0
// Deploy in us-east-1 regardless of distribution region

var AI_BOTS = [
  'gptbot', 'chatgpt-user', 'oai-searchbot',
  'claudebot', 'anthropic-ai', 'claude-web',
  'google-extended', 'ccbot', 'bytespider',
  'applebot-extended', 'perplexitybot', 'diffbot',
  'cohere-ai', 'facebookbot', 'amazonbot',
  'omgili', 'omgilibot', 'iaskspider', 'youbot',
];

var EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];

function handler(event) {
  var request = event.request;
  var uri = request.uri;

  // Allow exempt paths — bots need robots.txt
  for (var i = 0; i < EXEMPT_PATHS.length; i++) {
    if (uri.startsWith(EXEMPT_PATHS[i])) {
      return request;
    }
  }

  // Get User-Agent header (CloudFront Functions use lowercase header names)
  var ua = '';
  if (request.headers['user-agent']) {
    ua = request.headers['user-agent'].value.toLowerCase();
  }

  // Check for known AI bots
  for (var j = 0; j < AI_BOTS.length; j++) {
    if (ua.indexOf(AI_BOTS[j]) !== -1) {
      return {
        statusCode: 403,
        statusDescription: 'Forbidden',
        headers: {
          'content-type': { value: 'text/plain' },
        },
        body: 'Forbidden',
      };
    }
  }

  return request;
}

CloudFront User-Agent header note

CloudFront normalizes User-Agent by default. To read the full original UA string in CloudFront Functions, you must configure your cache policy to include User-Agent in the cache key (under "Headers — Include in cache key"). Without this, CloudFront may pass a simplified UA or none at all to the function.

Deploy via AWS CLI

# 1. Create the CloudFront Function
aws cloudfront create-function \
  --name block-ai-bots \
  --function-config Comment="Block AI training crawlers",Runtime=cloudfront-js-2.0 \
  --function-code fileb://block-ai-bots.js \
  --region us-east-1

# 2. Publish it
aws cloudfront publish-function \
  --name block-ai-bots \
  --if-match $(aws cloudfront describe-function --name block-ai-bots --query 'ETag' --output text) \
  --region us-east-1

# 3. Associate with your distribution (update your distribution config)
# Add to DefaultCacheBehavior or CacheBehaviors:
# FunctionAssociations:
#   Items:
#     - EventType: viewer-request
#       FunctionARN: arn:aws:cloudfront::ACCOUNT:function/block-ai-bots

Option B: Lambda@Edge (Node.js — more powerful)

Use Lambda@Edge when you need async calls (external blocklist lookup, DynamoDB query, etc.) or richer logic. Must be deployed in us-east-1.

// Lambda@Edge — viewer request trigger
// Runtime: nodejs20.x
// Deploy in us-east-1

const AI_BOTS = [
  'gptbot', 'chatgpt-user', 'oai-searchbot',
  'claudebot', 'anthropic-ai', 'claude-web',
  'google-extended', 'ccbot', 'bytespider',
  'applebot-extended', 'perplexitybot', 'diffbot',
  'cohere-ai', 'facebookbot', 'amazonbot',
  'omgili', 'omgilibot', 'iaskspider', 'youbot',
];

const EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];

exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const uri = request.uri;

  // Exempt paths
  if (EXEMPT_PATHS.some(p => uri.startsWith(p))) {
    return request;
  }

  // Get User-Agent (Lambda@Edge receives original headers)
  const uaHeader = request.headers['user-agent'];
  const ua = uaHeader ? uaHeader[0].value.toLowerCase() : '';

  const isAIBot = AI_BOTS.some(bot => ua.includes(bot));

  if (isAIBot) {
    return {
      status: '403',
      statusDescription: 'Forbidden',
      headers: {
        'content-type': [{ key: 'Content-Type', value: 'text/plain' }],
      },
      body: 'Forbidden',
    };
  }

  return request;
};

AWS WAF Bot Control (managed rules)

AWS WAF Bot Control is a managed rule group that uses AWS-maintained signatures — it updates automatically without code changes. It also uses behavioral signals beyond User-Agent (IP reputation, request patterns) making it harder to evade.

Cost: ~$10/month base + $1/million requests. Overkill for most sites but worth it at scale.

Terraform — WAF with Bot Control

# Terraform — WAF Web ACL with Bot Control
resource "aws_wafv2_web_acl" "main" {
  name  = "ai-bot-block"
  scope = "CLOUDFRONT"   # Must be us-east-1 for CloudFront

  default_action {
    allow {}
  }

  # AWS managed Bot Control rule group
  rule {
    name     = "BotControl"
    priority = 1

    statement {
      managed_rule_group_statement {
        vendor_name = "AWS"
        name        = "AWSManagedRulesBotControlRuleSet"

        # Optional: enable targeted inspection (more accurate, higher cost)
        managed_rule_group_configs {
          aws_managed_rules_bot_control_rule_set {
            inspection_level = "COMMON"  # or "TARGETED"
          }
        }
      }
    }

    override_action {
      count {}  # Use count{} first to audit, then switch to none{} to block
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "BotControl"
      sampled_requests_enabled   = true
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "WebACL"
    sampled_requests_enabled   = true
  }
}

# Associate with CloudFront distribution
resource "aws_cloudfront_distribution" "site" {
  # ...
  web_acl_id = aws_wafv2_web_acl.main.arn
}

API Gateway — blocking without CloudFront

For Lambda-backed REST APIs on API Gateway without CloudFront, check the User-Agent at the start of your Lambda handler or use a Lambda authorizer.

Lambda handler early return (simplest)

// Lambda function handler — check UA before processing
const AI_BOTS = ['gptbot', 'claudebot', 'anthropic-ai', 'google-extended',
  'ccbot', 'bytespider', 'applebot-extended', 'perplexitybot',
  'diffbot', 'cohere-ai', 'facebookbot', 'amazonbot', 'omgili'];

exports.handler = async (event) => {
  const ua = (event.headers?.['User-Agent'] ?? '').toLowerCase();
  const isAIBot = AI_BOTS.some(bot => ua.includes(bot));

  if (isAIBot) {
    return {
      statusCode: 403,
      headers: { 'Content-Type': 'text/plain' },
      body: 'Forbidden',
    };
  }

  // ... your actual handler logic
};

CloudFront Functions vs Lambda@Edge vs WAF

Feature	CF Functions	Lambda@Edge	WAF Bot Control
Runtime	Lightweight JS	Node.js/Python/Go	Managed (no code)
Latency	<1ms	1-5ms	<1ms
Cost	$0.10/M req	$0.60/M req	$10/mo + $1/M
UA string access	Requires cache policy config	Full original headers	Built-in
Auto-updates	Manual	Manual	Yes (AWS-maintained)
Async calls	No	Yes	Yes
Best for	Simple bot blocking	Complex logic	No-code, scale

FAQ

What is the difference between Lambda@Edge and CloudFront Functions for bot blocking?

CloudFront Functions run at every edge location with sub-millisecond latency and cost ~$0.10/million requests. They use a lightweight JS runtime — sufficient for UA string matching. Lambda@Edge runs full Node.js at regional caches, costs more, but supports async calls. For simple UA matching, use CloudFront Functions.

Should I use CloudFront Response Headers Policy or Lambda@Edge to add X-Robots-Tag?

Use CloudFront Response Headers Policy — no code, console configuration, applies at CDN edge. Lambda@Edge for header injection is overkill unless you need conditional logic (different headers per path).

Does CloudFront Functions have access to the full User-Agent header?

CloudFront normalizes User-Agent by default. To read the full original UA, configure your cache policy to include User-Agent in the cache key. Without this, CloudFront may pass a simplified UA.

How is AWS WAF Bot Control different from a Lambda@Edge bot block?

WAF Bot Control uses AWS-maintained signatures that update automatically. It also uses browser fingerprinting and IP reputation beyond UA matching. Lambda@Edge with a hardcoded list requires manual updates. WAF costs ~$10/month base. For most sites, Lambda@Edge with a good UA list is sufficient and cheaper.

Where do I deploy a Lambda@Edge function for CloudFront?

Lambda@Edge functions must be deployed in us-east-1 regardless of your distribution's primary region. CloudFront replicates them globally. Associate as a viewer request trigger — it runs before CloudFront checks cache or contacts the origin.

Can I block AI bots at API Gateway without CloudFront?

Yes. Use a Lambda authorizer (REQUEST type) that checks User-Agent and returns a Deny policy for known AI bots. Or check UA at the start of your Lambda handler and return 403 early.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.