How to Block AI Bots on Sails.js (Node.js): Complete 2026 Guide
Sails.js is the Node.js MVC framework built on Express — it adds policies (access control functions that run before controller actions) on top of Express middleware. Unlike raw Express app.use(), Sails policies are configured per-controller and per-action in config/policies.js.
Policies vs HTTP middleware — use both
HTTP middleware (in config/http.js) runs at the Express layer — before routing, before policies. Use it for the earliest, broadest blocking. Policies (in api/policies/) run after routing but before controller actions — use them for per-controller or per-action control. For AI bot blocking, either works; HTTP middleware is marginally more efficient.
Protection layers
Layer 1: robots.txt
Place robots.txt in the assets/ directory. Sails compiles assets into .tmp/public/ and serves them via Express static middleware (runs before policies):
# assets/robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: cohere-ai User-agent: Bytespider User-agent: Amazonbot User-agent: PerplexityBot User-agent: YouBot User-agent: Diffbot User-agent: DeepSeekBot User-agent: MistralBot User-agent: xAI-Bot User-agent: AI2Bot Disallow: /
Sails uses
assets/ (compiled to .tmp/public/). Express uses public/. Static files bypass policies entirely.Approach 1: Policy (idiomatic Sails)
Create api/policies/isNotAiBot.js. Call next() to continue or return a response to block:
// api/policies/isNotAiBot.js
module.exports = function isNotAiBot(req, res, next) {
const AI_BOTS = [
'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
'ai2bot', 'oai-searchbot', 'duckassistbot',
];
// Set noai meta for templates
res.locals.robots = 'noai, noimageai';
const ua = (req.headers['user-agent'] || '').toLowerCase();
const isBot = AI_BOTS.some(bot => ua.includes(bot));
if (isBot) {
return res.status(403).send('Forbidden: AI crawlers are not permitted.');
}
// Set X-Robots-Tag on all legitimate responses
res.set('X-Robots-Tag', 'noai, noimageai');
return next();
};Register globally in config/policies.js:
// config/policies.js
module.exports.policies = {
// Apply to ALL controller actions globally
'*': ['isNotAiBot'],
// Exempt specific controllers if needed:
// 'HealthController': { '*': true }, // true = no policies
};Controller-scoped policies
To block only on API controllers, apply per-controller instead of globally:
// config/policies.js
module.exports.policies = {
// Only block on API controller
'api/*': ['isNotAiBot'],
// Or per-action:
'ArticleController': {
'find': ['isNotAiBot'],
'findOne': ['isNotAiBot'],
'create': ['isAuthenticated'], // different policy
},
// Public pages — no bot blocking
'PageController': { '*': true },
};Approach 2: HTTP middleware (Express layer)
For the earliest possible blocking (before Sails routing), add a custom middleware in config/http.js:
// config/http.js
module.exports.http = {
middleware: {
// Add your custom middleware to the order array
order: [
'aiBotBlocker', // ← BEFORE bodyParser, session, router
'cookieParser',
'session',
'bodyParser',
'compress',
'poweredBy',
'router',
'www',
'favicon',
],
aiBotBlocker: function (req, res, next) {
const AI_BOTS = [
'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
'ai2bot', 'oai-searchbot', 'duckassistbot',
];
const EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];
// Exempt paths bypass blocking
if (EXEMPT_PATHS.includes(req.path)) {
return next();
}
const ua = (req.headers['user-agent'] || '').toLowerCase();
if (AI_BOTS.some(bot => ua.includes(bot))) {
return res.status(403).send('Forbidden: AI crawlers are not permitted.');
}
// X-Robots-Tag on all legitimate responses
res.set('X-Robots-Tag', 'noai, noimageai');
return next();
},
},
};Place
aiBotBlocker first in the order array — before bodyParser and session. This rejects bots before any body parsing or session creation (same reason CakePHP uses prepend()).Layer 2: noai meta tag
res.locals passes data to views in Sails (EJS, Pug, etc.). Set it in the policy or middleware, read in your layout:
<!-- views/layouts/layout.ejs --> <head> <meta name="robots" content="<%= locals.robots || 'noai, noimageai' %>"> </head> <!-- Override per-action in a controller: --> <!-- res.locals.robots = 'index, follow'; -->
Sails.js vs Express vs NestJS — Node.js comparison
Sails.js — policy (returns res.forbidden() or calls next())
// api/policies/isNotAiBot.js
module.exports = function (req, res, next) {
const ua = (req.headers['user-agent'] || '').toLowerCase();
if (AI_BOTS.some(bot => ua.includes(bot)))
return res.status(403).send('Forbidden');
return next(); // continue to action
};Express — app.use() middleware
// Direct Express middleware
app.use((req, res, next) => {
const ua = (req.headers['user-agent'] || '').toLowerCase();
if (AI_BOTS.some(bot => ua.includes(bot)))
return res.status(403).send('Forbidden');
next();
});NestJS — Guard with @UseGuards()
@Injectable()
export class AiBotGuard implements CanActivate {
canActivate(context: ExecutionContext): boolean {
const req = context.switchToHttp().getRequest();
const ua = (req.headers['user-agent'] || '').toLowerCase();
if (AI_BOTS.some(b => ua.includes(b)))
throw new ForbiddenException('AI crawlers blocked');
return true;
}
}Testing
Use supertest with sails.lift() in your test setup:
// test/integration/policies/isNotAiBot.test.js
const sails = require('sails');
const request = require('supertest');
describe('AI Bot Blocking', () => {
before((done) => sails.lift({ log: { level: 'silent' } }, done));
after((done) => sails.lower(done));
it('blocks AI bots with 403', async () => {
await request(sails.hooks.http.app)
.get('/api/articles')
.set('User-Agent', 'GPTBot/1.0')
.expect(403);
});
it('allows normal browsers', async () => {
const res = await request(sails.hooks.http.app)
.get('/api/articles')
.set('User-Agent', 'Mozilla/5.0 (compatible)')
.expect(200);
expect(res.headers['x-robots-tag']).toBe('noai, noimageai');
});
it('serves robots.txt to bots', async () => {
await request(sails.hooks.http.app)
.get('/robots.txt')
.set('User-Agent', 'GPTBot/1.0')
.expect(200);
});
});AI bot User-Agent strings (2026)
Access via req.headers['user-agent'] — lowercase with .toLowerCase() before matching with .includes().
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.