Repo: github.com/midimurphdesigns/kev-o-ai-search
Live: kev-o.kevinmurphywebdev.com
Read the full story: Building Kev-O
Kev-O is a grounded chatbot that answers questions about my work using only the public corpus I've written. Blog posts, project case studies, resume, About page, the READMEs of my open-source repos. He cites his receipts. He refuses to invent. 212 chunks across 4 sources, 3 surfaces sharing one brain, ~$0.005 per turn with cached system prompt.
How it's built
Next.js 16 App Router on Vercel, TypeScript strict, Vercel AI SDK (streamText plus useChat) on @ai-sdk/anthropic with Claude Sonnet 4.6. Hybrid retrieval: BM25 candidate pool over the full corpus (ported from fedbench, k1=1.5, b=0.75, ~3ms in-memory) then Voyage voyage-rerank-2.5 cross-encoder rerank narrowing 20 lexical candidates to the top 6 semantic winners. For inline punch-ins the page body itself is forged as a synthetic passage at position 0 so Kev-O is most likely to cite the article the visitor is reading. The corpus is built at deploy time via the mdx-corpus primitive I extracted from this build. The hosted surface is hardened with a $5/UTC-day spend cap (charged post-stream against actual reported token usage, not estimated), a 15-req/hour-per-IP rate limit via Upstash Redis, and an owner-bypass route that fails closed if the admin key is unset, uses Node's timingSafeEqual for comparison, returns 404 (not 401) on wrong keys, and rate-limits attempts at 5/hour BEFORE the key check so an attacker exhausts their budget regardless of guess outcome.
Three surfaces, one brain
The chat lives in three places because the visitor's intent is different in each. The subdomain is a standalone full-page conversation, the URL hiring managers share. The Command-K palette puts Kev-O at the top of the global keyboard surface on every page of the main site. The inline punch-ins sit at the foot of every curated blog post and project case study with the page already as ground truth. All three call the same /api/kev-o endpoint; the subdomain is a thin proxy. One brain, three surfaces, zero divergence by construction.
Artifacts worth reading
- The retrieval pipeline. BM25 plus Voyage rerank with graceful fallback when the rerank API is missing.
- The BM25 implementation. Ported from fedbench, decoupled from its corpus-path machinery, retuned for in-memory chunks.
- The owner-bypass route. The security posture I'm proudest of in the build (lives on the main-site repo; the canonical reference).
- The extracted mdx-corpus primitive. The part I'd most recommend reading because it's the design judgment under load.
The trade-offs
Three surfaces is more code than one. It's also the thing that lets the bot meet the visitor where they are: deep-evaluating the case study they just read, browsing for a quick answer, sharing the URL with a colleague. The shared /api/kev-o endpoint and the inline-punch-in page-context grounding are how I prevent that surface count from becoming three slightly different experiences. The cost shape says the same thing: BM25 is free, rerank costs a fraction of a cent, generation dominates. Optimizing retrieval further would be optimizing the wrong axis.