Skip to content

How to Block AI Bots in Common Lisp Hunchentoot

Hunchentoot is the most widely used Common Lisp web server — a threaded HTTP server with a CLOS-based acceptor architecture, easy handler dispatch, and special variables for request/reply state. Two approaches exist for bot blocking. The simpler one: *before-request-hook* — set it to a zero-argument function that runs before every request. The more idiomatic Lisp approach: define a custom acceptor class and specialise acceptor-dispatch-request via CLOS. Both use the same short-circuit: set (hunchentoot:return-code*) to +http-forbidden+, then call (hunchentoot:abort-request-handler) — which signals a condition that Hunchentoot catches to terminate processing. Header access uses (hunchentoot:header-in* :user-agent); the * suffix indicates it operates on the current *request* special variable.

1. Bot detection

A Common Lisp function with no library dependencies. SEARCH performs literal substring search — returns the start index or NIL, used as a truthy test. SOME short-circuits on the first match. string-downcase applied once before iteration.

;;; bot-utils.lisp — AI bot detection, no dependencies
(in-package :my-app)

;;; All lowercase — matched against (string-downcase ua)
(defparameter *ai-bot-patterns*
  '("gptbot"
    "chatgpt-user"
    "claudebot"
    "anthropic-ai"
    "ccbot"
    "google-extended"
    "cohere-ai"
    "meta-externalagent"
    "bytespider"
    "omgili"
    "diffbot"
    "imagesiftbot"
    "magpie-crawler"
    "amazonbot"
    "dataprovider"
    "netcraft")
  "Lowercase substrings to match against the User-Agent header.")

(defun ai-bot-p (ua)
  "Return T if UA string matches a known AI crawler pattern."
  (when (and ua (not (string= ua "")))
    (let ((lower (string-downcase ua)))
      ;; SEARCH returns the start index or NIL — literal substring, no regex
      (some (lambda (pattern) (search pattern lower)) *ai-bot-patterns*))))

2. *before-request-hook* — global filter

Set hunchentoot:*before-request-hook* to a function of zero arguments. It runs in the request context — *request* and *reply* are bound. Set (return-code*) to +http-forbidden+ before calling abort-request-handler — the status code must be set before the condition is signalled.

;;; server.lisp — Hunchentoot server with *before-request-hook*
(in-package :my-app)

;;; ── *before-request-hook* approach ──────────────────────────────────────────
;;; Set to a zero-argument function called before every request dispatch.
;;; Runs inside the request context — *request* and *reply* are bound.
;;; Return value is ignored; use abort-request-handler to short-circuit.

(defun check-ai-bot ()
  "Block AI crawlers before request dispatch."
  ;; Path guard: let robots.txt through.
  ;; Hunchentoot calls this before dispatch — static files not yet considered.
  (let ((path (hunchentoot:script-name* hunchentoot:*request*)))
    (when (string= path "/robots.txt")
      (return-from check-ai-bot)))  ; pass through

  ;; header-in* reads the incoming request header.
  ;; :user-agent keyword — Hunchentoot normalises to lowercase internally.
  ;; Returns NIL when the header is absent.
  (let ((ua (or (hunchentoot:header-in* :user-agent) "")))
    (when (ai-bot-p ua)
      ;; Set X-Robots-Tag on the blocked response.
      ;; setf header-out writes to the outgoing reply (*reply* special var).
      (setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
      ;; Set the HTTP status code before aborting.
      ;; +http-forbidden+ = 403
      (setf (hunchentoot:return-code*) hunchentoot:+http-forbidden+)
      ;; abort-request-handler signals a condition that Hunchentoot catches
      ;; to terminate request processing immediately.
      (hunchentoot:abort-request-handler "Forbidden"))))

;;; Register the hook globally — fires for every request on every acceptor.
(setf hunchentoot:*before-request-hook* #'check-ai-bot)

3. *after-request-hook* — X-Robots-Tag on passing responses

*after-request-hook* fires after the handler runs for requests that were not aborted. Together with the before-hook, every response gets X-Robots-Tag with no duplication.

;;; Add *after-request-hook* for X-Robots-Tag on passing responses.
;;; Fires after the handler runs — for requests that were not aborted.

(defun add-robots-header ()
  "Add X-Robots-Tag to all passing responses."
  (setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai"))

(setf hunchentoot:*after-request-hook* #'add-robots-header)

;;; Note: *before-request-hook* and *after-request-hook* are complementary:
;;; - Blocked requests: X-Robots-Tag set in check-ai-bot (before abort)
;;; - Passing requests: X-Robots-Tag set in add-robots-header (after handler)
;;; This covers all responses without duplication.

4. Custom acceptor — CLOS method specialisation

Define a subclass of easy-acceptor and specialise acceptor-dispatch-request on it. This is more idiomatic Common Lisp — per-instance control, clean CLOS dispatch, no global variable mutation. call-next-method invokes the parent class dispatch for passing requests.

;;; Alternative: CLOS acceptor override — per-acceptor bot blocking.
;;; More idiomatic Common Lisp; gives per-instance control.
;;; Use this when you have multiple acceptors and want to scope the filter.

(defclass bot-blocking-acceptor (hunchentoot:easy-acceptor)
  ()
  (:documentation "Acceptor that blocks AI crawlers before dispatch."))

;;; Specialise acceptor-dispatch-request on our custom class.
;;; Called for every request — fires before handlers.
(defmethod hunchentoot:acceptor-dispatch-request
    ((acceptor bot-blocking-acceptor) request)
  (let ((path (hunchentoot:script-name* request))
        (ua   (or (hunchentoot:header-in* :user-agent) "")))

    ;; Path guard
    (unless (string= path "/robots.txt")
      (when (ai-bot-p ua)
        (setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
        (setf (hunchentoot:return-code*)
              hunchentoot:+http-forbidden+)
        (hunchentoot:abort-request-handler "Forbidden"))))

  ;; Pass through: inject X-Robots-Tag, then call the next method (dispatch).
  (setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
  (call-next-method))

5. Route handlers — define-easy-handler

define-easy-handler registers a handler at a URI path. The before-hook fires before any handler is called — blocked requests never reach these functions.

;;; handlers.lisp — Easy handlers (routes)
(in-package :my-app)

;;; define-easy-handler registers a handler at a URI path.
;;; The before-hook fires before this handler is called.

(hunchentoot:define-easy-handler (index :uri "/") ()
  (setf (hunchentoot:content-type*) "application/json")
  "{"message": "Hello"}")

(hunchentoot:define-easy-handler (api-data :uri "/api/data") ()
  (setf (hunchentoot:content-type*) "application/json")
  "{"data": "value"}")

(hunchentoot:define-easy-handler (robots-txt :uri "/robots.txt") ()
  (setf (hunchentoot:content-type*) "text/plain")
  "User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /")

6. Start the server

;;; main.lisp — start the Hunchentoot server
(in-package :my-app)

;;; Register hooks
(setf hunchentoot:*before-request-hook* #'check-ai-bot)
(setf hunchentoot:*after-request-hook*  #'add-robots-header)

;;; Create and start an easy-acceptor on port 8080.
;;; easy-acceptor uses easy-handler dispatch (define-easy-handler).
(defvar *server*
  (hunchentoot:start
   (make-instance 'hunchentoot:easy-acceptor :port 8080)))

;;; To use the custom CLOS acceptor instead:
;; (defvar *server*
;;   (hunchentoot:start
;;    (make-instance 'bot-blocking-acceptor :port 8080)))

;;; To stop: (hunchentoot:stop *server*)

7. ASDF system definition

;;; my-app.asd — ASDF system definition
(asdf:defsystem #:my-app
  :description "Hunchentoot web application with AI bot blocking"
  :author "Example"
  :license "MIT"
  :version "0.1.0"
  :depends-on (#:hunchentoot)
  :components ((:file "package")
               (:file "bot-utils"  :depends-on ("package"))
               (:file "server"     :depends-on ("bot-utils"))
               (:file "handlers"   :depends-on ("server"))
               (:file "main"       :depends-on ("handlers"))))

;;; package.lisp
(defpackage #:my-app
  (:use #:common-lisp)
  (:export #:start-server #:stop-server))

Key points

Framework comparison — Lisp and functional web servers

FrameworkHook / middlewareBlock callUA header
Hunchentoot (CL)*before-request-hook*setf return-code* +http-forbidden+; abort-request-handler(header-in* :user-agent)
Clojure Ringmiddleware functionreturn {:status 403 :body "..."}(get-in req [:headers "user-agent"])
Erlang Cowboyinit/2 callback{stop, Reply, State}cowboy_req:header(‹<<"user-agent">>, Req)
Gleam Wispmiddleware functionreturn wisp.response_403()request.get_header(req, "user-agent")

Hunchentoot's condition-signalling abort is unique among all frameworks in this series — it uses Common Lisp's condition system (not exceptions in the traditional sense) to terminate request processing. The special variable naming convention (*before-request-hook*, header-in*, return-code*) is idiomatic Common Lisp — *earmuffs* for dynamic variables, * suffix for "current request" accessors.

Dependencies

# Install Quicklisp (CL package manager) if not already installed:
# curl -O https://beta.quicklisp.org/quicklisp.lisp
# sbcl --load quicklisp.lisp --eval '(quicklisp-quickstart:install)'

# Load Hunchentoot in SBCL (Steel Bank Common Lisp):
# (ql:quickload :hunchentoot)

# Or add to my-app.asd and load with:
# (ql:quickload :my-app)

# Run the application:
# sbcl --load my-app.asd \
#      --eval "(ql:quickload :my-app)" \
#      --eval "(my-app:start-server)"

# Other CL implementations: CCL, ECL, ABCL (JVM-based), CLISP
# Hunchentoot works on all major implementations.