How to Block AI Bots in Supabase Edge Functions: Complete 2026 Guide
Supabase Edge Functions run on Deno at the network edge using the standard Fetch API. Blocking AI bots is a single header check before any database call — zero Supabase reads for blocked requests. The _shared/ directory lets you define the UA list once and import it across every function in your project.
Deno-native — no npm install needed
Supabase Edge Functions are TypeScript-first. Your bot-blocking helper is a plain .ts file — no build step, no npm package. Import with a relative path: import { isAiBot } from '../_shared/ai-bots.ts'.
Protection layers
Step 1 — Shared AI bot helper (_shared/ai-bots.ts)
Create supabase/functions/_shared/ai-bots.ts. Supabase bundles the _shared/ directory at deploy time — it is not deployed as its own function. Import it with a relative path from any function.
// supabase/functions/_shared/ai-bots.ts
export const AI_BOTS = [
// OpenAI
'gptbot', 'chatgpt-user', 'oai-searchbot',
// Anthropic
'claudebot', 'claude-web',
// Common Crawl / CCBot
'ccbot',
// Bytedance
'bytespider',
// Meta
'meta-externalagent',
// Perplexity
'perplexitybot',
// Google AI
'google-extended', 'googleother',
// Cohere
'cohere-ai',
// Amazon
'amazonbot',
// Diffbot
'diffbot',
// AI2
'ai2bot',
// DeepSeek
'deepseekbot',
// Mistral
'mistralai-user',
// xAI
'xai-bot',
// You.com
'youbot',
// DuckDuckGo AI
'duckassistbot',
// Webzio
'webzio',
] as const;
export function isAiBot(userAgent: string | null): boolean {
if (!userAgent) return false;
const ua = userAgent.toLowerCase();
return AI_BOTS.some((bot) => ua.includes(bot));
}Step 2 — Block AI bots in your edge function
The check runs at the top of Deno.serve() before any database access. A 403 response costs zero Supabase reads.
// supabase/functions/my-api/index.ts
import { isAiBot } from '../_shared/ai-bots.ts';
const ROBOTS_TXT = `User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /`;
Deno.serve(async (req: Request) => {
// 1. Serve robots.txt (if this function handles the /robots.txt path)
if (new URL(req.url).pathname.endsWith('/robots.txt')) {
return new Response(ROBOTS_TXT, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
// 2. Block AI bots — before any DB call or business logic
if (isAiBot(req.headers.get('user-agent'))) {
return new Response('Forbidden', { status: 403 });
}
// 3. Handle CORS preflight
if (req.method === 'OPTIONS') {
return new Response(null, {
headers: {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Headers': 'authorization, x-client-info, apikey, content-type',
},
});
}
// 4. Your normal business logic
const data = { message: 'Hello, architect.' };
return new Response(JSON.stringify(data), {
headers: {
'Content-Type': 'application/json',
// 5. X-Robots-Tag on all successful responses
'X-Robots-Tag': 'noai, noimageai',
},
});
});Step 3 — Dedicated robots function
Supabase Edge Functions have no filesystem at runtime — you cannot read a robots.txt file. Embed it as a string constant and serve it from a dedicated robots function. Route GET /robots.txt to this function via your CDN, Vercel rewrite, or Nginx proxy rule.
// supabase/functions/robots/index.ts
// Deploy and route GET /robots.txt → this function via your proxy/CDN
const ROBOTS_TXT = `User-agent: *
Allow: /
# AI training bots — blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /`;
Deno.serve((_req: Request) => {
return new Response(ROBOTS_TXT, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'public, max-age=86400',
},
});
});Deploy: supabase functions deploy robots --no-verify-jwt — robots.txt must be publicly readable without a Supabase JWT. Set --no-verify-jwt for this function only.
Step 4 — Dynamic block-list from a Supabase table
For rule updates without a redeploy, store UA patterns in a blocked_bots table and cache per isolate instance. The module-level cache avoids a DB hit on every request; a TTL forces a re-fetch after 5 minutes.
// supabase/functions/my-api/index.ts — dynamic block-list from Supabase table
import { createClient } from 'npm:@supabase/supabase-js@2';
import { AI_BOTS } from '../_shared/ai-bots.ts'; // fallback
const supabaseAdmin = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!,
);
// Module-level cache — persists for the lifetime of this isolate instance
let cachedBotList: string[] | null = null;
let cacheExpiresAt = 0;
const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
async function getBotList(): Promise<string[]> {
const now = Date.now();
if (cachedBotList && now < cacheExpiresAt) return cachedBotList;
try {
const { data, error } = await supabaseAdmin
.from('blocked_bots')
.select('ua_pattern');
if (error || !data?.length) {
// Fall back to hardcoded list on DB error
return [...AI_BOTS];
}
cachedBotList = data.map((r: { ua_pattern: string }) => r.ua_pattern.toLowerCase());
cacheExpiresAt = now + CACHE_TTL_MS;
return cachedBotList;
} catch {
return [...AI_BOTS];
}
}
Deno.serve(async (req: Request) => {
const ua = req.headers.get('user-agent')?.toLowerCase() ?? '';
const botList = await getBotList();
if (botList.some((pattern) => ua.includes(pattern))) {
return new Response('Forbidden', { status: 403 });
}
// ... rest of your handler
return new Response(JSON.stringify({ ok: true }), {
headers: { 'Content-Type': 'application/json', 'X-Robots-Tag': 'noai, noimageai' },
});
});Create the table:
create table blocked_bots (
id serial primary key,
ua_pattern text not null unique,
created_at timestamptz default now()
);
insert into blocked_bots (ua_pattern) values
('gptbot'), ('claudebot'), ('ccbot'), ('bytespider'),
('google-extended'), ('perplexitybot'), ('amazonbot');Step 5 — noai meta tag + X-Robots-Tag for SSR functions
For edge functions that return HTML, add both the meta tag and the response header. Crawlers that bypass your UA check will still see the directive.
// supabase/functions/ssr-page/index.ts — X-Robots-Tag for every HTML response
Deno.serve(async (req: Request) => {
if (isAiBot(req.headers.get('user-agent'))) {
return new Response('Forbidden', { status: 403 });
}
const html = `<!DOCTYPE html>
<html>
<head>
<!-- noai meta tag for crawlers that reach this far -->
<meta name="robots" content="noai, noimageai">
</head>
<body>...</body>
</html>`;
return new Response(html, {
headers: {
'Content-Type': 'text/html; charset=utf-8',
// Belt-and-suspenders: header + meta tag
'X-Robots-Tag': 'noai, noimageai',
},
});
});Supabase Edge vs Cloudflare Workers vs Vercel Edge vs Lambda@Edge
| Feature | Supabase Edge | CF Workers | Vercel Edge | Lambda@Edge |
|---|---|---|---|---|
| Runtime | Deno (TypeScript native) | V8 isolate (no Node/Deno) | V8 isolate (Next.js middleware) | Node.js / Python (heavier) |
| UA block pattern | req.headers.get("user-agent") | request.headers.get("user-agent") | req.headers.get("user-agent") | event.Records[0].cf.request.headers["user-agent"] |
| robots.txt | Dedicated function or Storage bucket | Workers Assets or explicit route | public/ directory (static) | S3 origin or Lambda@Edge route |
| Path interception | Proxy/CDN rewrite needed for /robots.txt | Workers Routes — any path | middleware.ts — any path | CloudFront behaviours |
| Shared helpers | _shared/ directory | utils/ + wrangler.toml | lib/ imported from middleware | Lambda Layers |
| Dynamic rules | Supabase table (first-class) | Workers KV or D1 | Upstash Redis or external DB | DynamoDB or SSM Parameter Store |
| Cold start latency | ~50–150 ms (Deno isolate) | <5 ms (V8 isolate) | <5 ms (V8 isolate) | ~100–500 ms (Node.js) |
| Deploy command | supabase functions deploy | wrangler deploy | vercel deploy (automatic) | aws lambda update-function-code |
Quick reference
req.headers.get('user-agent')new Response('Forbidden', { status: 403 })headers.set('X-Robots-Tag', 'noai, noimageai')import { isAiBot } from '../_shared/ai-bots.ts'supabase functions deploy <name>supabase functions deploy robots --no-verify-jwtsupabase functions serveFAQ
How do I block AI bots in a Supabase Edge Function?
Check the User-Agent at the start of Deno.serve(): const ua = req.headers.get('user-agent')?.toLowerCase() ?? ''. Match against an AI_BOTS array with AI_BOTS.some(b => ua.includes(b)). Return new Response('Forbidden', { status: 403 }) before any database call. Define AI_BOTS at module scope — initialised once per cold start, not per request.
How do I serve robots.txt from Supabase Edge Functions?
Supabase Edge Functions have no filesystem at runtime. Embed robots.txt as a string constant and serve from a dedicated robots function. Route GET /robots.txt to it via your CDN or reverse proxy. Deploy with --no-verify-jwt so crawlers can access it without authentication. Alternatively, upload robots.txt to a public Supabase Storage bucket and serve from the CDN URL directly.
How do I share the AI bot list across multiple edge functions?
Create supabase/functions/_shared/ai-bots.ts with the UA array and an isAiBot() helper. Import it with import { isAiBot } from '../_shared/ai-bots.ts'. Supabase bundles the _shared/ directory at deploy time — it is not a deployable function itself. Keep the shared file lean: only the UA list and the check function, no Supabase client imports.
How is Supabase Edge different from Cloudflare Workers for bot blocking?
Both use V8-based edge runtimes with the Fetch API. Key differences: Supabase Edge runs on Deno (TypeScript native, JSR imports, Deno.serve()). Cloudflare Workers use a V8-only environment with export default { fetch }. Workers can intercept any URL path via Workers Routes; Supabase functions live at /functions/v1/<name> and require a proxy rewrite to intercept /robots.txt. Supabase functions have first-class Postgres access; Workers use KV or D1.
Can I update block rules without redeploying my edge function?
Yes — store UA patterns in a blocked_bots Supabase table and query it with the service-role client at the start of the handler. Cache the result in a module-level variable with a TTL (e.g. 5 minutes) so subsequent requests in the same isolate instance skip the DB query. Cold starts re-fetch. New isolate instances (after a re-deploy or after the instance is evicted) will pick up the latest rules on their first request.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.