How to Block AI Bots in Go Hertz
Hertz is ByteDance's high-performance HTTP framework for Go, built on netpoll for epoll/kqueue-based I/O instead of net/http. Middleware signature: func(ctx context.Context, c *app.RequestContext). c.Request.Header.Get("User-Agent") returns an empty string when absent — never nil. c.AbortWithStatus(403) stops the handler chain, but code after Abort still executes — always return immediately after aborting. Patterns are similar to Gin but types are incompatible — Hertz uses its own protocol layer with zero-copy header parsing.
1. Bot detection
Pure Go, no dependencies. strings.ToLower for case-folding, strings.Contains for substring matching. Safe on empty strings without a nil-check.
// ai_bot_detector.go — AI bot detection, no external dependencies
package middleware
import "strings"
// aiBotPatterns contains known AI crawler User-Agent substrings.
// All lowercase — compared against a lowercased User-Agent.
var aiBotPatterns = []string{
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
}
// IsAIBot returns true if the User-Agent matches a known AI crawler.
//
// Hertz's ctx.Request.Header.Get() returns an empty string when the
// header is absent — the same behavior as net/http's Header.Get().
// strings.ToLower on "" returns "" so no nil-check is needed.
func IsAIBot(userAgent string) bool {
if userAgent == "" {
return false
}
lower := strings.ToLower(userAgent)
for _, pattern := range aiBotPatterns {
if strings.Contains(lower, pattern) {
return true
}
}
return false
}2. Middleware and server setup
h.Use() registers global middleware. c.AbortWithStatus() prevents downstream handlers from running. Always return after aborting — Abort does not exit the function.
// main.go — Hertz server with AI bot blocking middleware
// Install: go get github.com/cloudwego/hertz
package main
import (
"context"
"github.com/cloudwego/hertz/pkg/app"
"github.com/cloudwego/hertz/pkg/app/server"
"github.com/cloudwego/hertz/pkg/common/utils"
"github.com/cloudwego/hertz/pkg/protocol/consts"
"yourmodule/middleware"
)
// ── Middleware ─────────────────────────────────────────────────────────────
//
// Hertz middleware signature: func(ctx context.Context, c *app.RequestContext)
//
// - c.Request.Header.Get("User-Agent") returns string ("" when absent)
// - c.AbortWithStatus(403) stops the handler chain — remaining handlers skipped
// - c.Next(ctx) passes to the next handler in the chain
// - c.Header("X-Robots-Tag", "noai, noimageai") sets a response header
//
// IMPORTANT: c.Abort() does NOT return from the function — code after
// c.AbortWithStatus() still executes. Always "return" after aborting.
func AiBotBlocker() app.HandlerFunc {
return func(ctx context.Context, c *app.RequestContext) {
path := string(c.Request.URI().Path())
// Always allow robots.txt so crawlers discover Disallow rules.
if path == "/robots.txt" {
c.Next(ctx)
return
}
// Get User-Agent — returns "" when absent, never nil.
// Hertz uses []byte internally but Get() returns string.
ua := c.Request.Header.Get("User-Agent")
if middleware.IsAIBot(ua) {
// Block: set X-Robots-Tag, abort with 403, then return.
c.Header("Content-Type", "text/plain")
c.Header("X-Robots-Tag", "noai, noimageai")
c.AbortWithStatus(consts.StatusForbidden)
return // MUST return — code after Abort still executes
}
// Pass: set X-Robots-Tag on the response, then continue chain.
// Headers set before c.Next() appear on the response regardless
// of what downstream handlers do (unless they overwrite them).
c.Header("X-Robots-Tag", "noai, noimageai")
c.Next(ctx)
}
}
// ── Handlers ──────────────────────────────────────────────────────────────
const robotsTxt = `User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
`
func main() {
h := server.Default(server.WithHostPorts("0.0.0.0:8080"))
// Global middleware — applied to every route.
h.Use(AiBotBlocker())
h.GET("/robots.txt", func(ctx context.Context, c *app.RequestContext) {
c.Header("Content-Type", "text/plain")
c.String(consts.StatusOK, robotsTxt)
})
h.GET("/", func(ctx context.Context, c *app.RequestContext) {
c.JSON(consts.StatusOK, utils.H{"message": "ok"})
})
h.Spin()
}3. Route-group middleware
Use h.Group() to scope middleware to a path prefix. Only routes registered on the group get bot blocking — public routes remain unaffected.
// Route-group middleware — protect only /api/* routes.
//
// Hertz route groups work like Gin: group.Use() applies middleware
// only to routes registered on that group.
func main() {
h := server.Default(server.WithHostPorts("0.0.0.0:8080"))
// Public routes — no bot blocking
h.GET("/", indexHandler)
h.GET("/about", aboutHandler)
h.GET("/robots.txt", robotsHandler)
// API routes — bot blocking applied
api := h.Group("/api")
api.Use(AiBotBlocker()) // only /api/* gets blocked
{
api.GET("/data", dataHandler)
api.GET("/users", usersHandler)
}
h.Spin()
}4. JSON error response
c.AbortWithStatusJSON() sets the status code and marshals a JSON body in one call — useful for API endpoints that should return structured error responses.
// Abort with a JSON error body instead of empty 403.
//
// c.AbortWithMsg() sets both status and body.
// c.AbortWithStatusJSON() sets status + JSON body (convenience).
func AiBotBlockerJSON() app.HandlerFunc {
return func(ctx context.Context, c *app.RequestContext) {
ua := c.Request.Header.Get("User-Agent")
if middleware.IsAIBot(ua) {
c.Header("X-Robots-Tag", "noai, noimageai")
// AbortWithStatusJSON marshals the body as JSON automatically.
c.AbortWithStatusJSON(consts.StatusForbidden, utils.H{
"error": "forbidden",
"message": "AI crawlers are not permitted",
})
return
}
c.Header("X-Robots-Tag", "noai, noimageai")
c.Next(ctx)
}
}Key points
Header.Get()returns"", nevernil:c.Request.Header.Get("User-Agent")returns an empty string when the header is absent. Internally Hertz stores headers as[]bytebutGet()converts tostring. Safe to pass directly tostrings.ToLower.Abortdoes notreturn:c.AbortWithStatus()sets an internal index to skip remaining handlers, but your function keeps executing. Always writereturnon the next line. Forgetting this means the pass-through branch (setting headers, callingc.Next) runs after the abort — same behavior as Gin.- Not net/http compatible: Hertz uses its own protocol layer (
netpoll+ zero-copy parsing), notnet/http. You cannot usehttp.Handlerorhttp.HandlerFuncmiddleware directly. Porting from Gin is straightforward (same API patterns, different types); porting from stdlib or Echo requires a wrapper. c.Next(ctx)takescontext.Context: Unlike Gin'sc.Next(), Hertz requires the Gocontext.Contextparameter. This enables cancellation propagation, deadline awareness, and tracing context throughout the middleware chain.server.Default()includes recovery middleware:server.Default()pre-registers panic recovery.server.New()does not. If you useserver.New(), add recovery middleware manually or a panic in a handler will crash the process.c.Request.URI().Path()returns[]byte: Convert tostringwithstring(c.Request.URI().Path())for path comparisons. Hertz keeps paths as byte slices internally for zero-copy performance.
Framework comparison — Go HTTP middleware models
| Framework | Protocol layer | Block request | Header access |
|---|---|---|---|
| Hertz | netpoll (own protocol) | c.AbortWithStatus(403) | c.Request.Header.Get() → "" |
| Gin | net/http | c.AbortWithStatus(403) | c.GetHeader() → "" |
| Fiber | fasthttp (own protocol) | c.SendStatus(403) + return | c.Get("User-Agent") → "" |
| Echo | net/http | Return echo.NewHTTPError(403) | c.Request().Header.Get() → "" |
Hertz and Fiber both bypass net/http for performance (netpoll and fasthttp respectively), trading standard library compatibility for throughput. Gin and Echo use net/http and are compatible with any http.Handler middleware. For bot detection (pure string matching, no I/O), the performance difference is negligible — the choice depends on your existing stack.