Skip to content
back to portfolio

Open-source and personal projects2026

kev-o — grounded RAG chatbot trained on my own writing

Public open-source grounded chatbot answering questions about my work using only my public corpus (blog, case studies, resume, OSS READMEs). Hybrid retrieval — BM25 candidate pool plus Voyage cross-encoder rerank — feeds Claude Sonnet 4.6 streamed through the Vercel AI SDK. Three surfaces share one brain: a standalone subdomain, a global Command-K palette, and inline punch-ins at the foot of every curated entry. Hardened with per-IP rate limit, daily USD spend cap, and an owner-bypass route with timing-safe key comparison and per-IP attempt cap.

Author · Applied AITypeScriptNext.js 16Vercel AI SDKAnthropic SDKVoyage rerankBM25UpstashTailwind v4

Repo: github.com/midimurphdesigns/kev-o-ai-search

Live: kev-o.kevinmurphywebdev.com

Read the full story: Building Kev-O

Kev-O is a grounded chatbot that answers questions about my work using only the public corpus I've written. Blog posts, project case studies, resume, About page, the READMEs of my open-source repos. He cites his receipts. He refuses to invent. 212 chunks across 4 sources, 3 surfaces sharing one brain, ~$0.005 per turn with cached system prompt.

How it's built

Next.js 16 App Router on Vercel, TypeScript strict, Vercel AI SDK (streamText plus useChat) on @ai-sdk/anthropic with Claude Sonnet 4.6. Hybrid retrieval: BM25 candidate pool over the full corpus (ported from fedbench, k1=1.5, b=0.75, ~3ms in-memory) then Voyage voyage-rerank-2.5 cross-encoder rerank narrowing 20 lexical candidates to the top 6 semantic winners. For inline punch-ins the page body itself is forged as a synthetic passage at position 0 so Kev-O is most likely to cite the article the visitor is reading. The corpus is built at deploy time via the mdx-corpus primitive I extracted from this build. The hosted surface is hardened with a $5/UTC-day spend cap (charged post-stream against actual reported token usage, not estimated), a 15-req/hour-per-IP rate limit via Upstash Redis, and an owner-bypass route that fails closed if the admin key is unset, uses Node's timingSafeEqual for comparison, returns 404 (not 401) on wrong keys, and rate-limits attempts at 5/hour BEFORE the key check so an attacker exhausts their budget regardless of guess outcome.

Three surfaces, one brain

The chat lives in three places because the visitor's intent is different in each. The subdomain is a standalone full-page conversation, the URL hiring managers share. The Command-K palette puts Kev-O at the top of the global keyboard surface on every page of the main site. The inline punch-ins sit at the foot of every curated blog post and project case study with the page already as ground truth. All three call the same /api/kev-o endpoint; the subdomain is a thin proxy. One brain, three surfaces, zero divergence by construction.

Artifacts worth reading

  • The retrieval pipeline. BM25 plus Voyage rerank with graceful fallback when the rerank API is missing.
  • The BM25 implementation. Ported from fedbench, decoupled from its corpus-path machinery, retuned for in-memory chunks.
  • The owner-bypass route. The security posture I'm proudest of in the build (lives on the main-site repo; the canonical reference).
  • The extracted mdx-corpus primitive. The part I'd most recommend reading because it's the design judgment under load.

The trade-offs

Three surfaces is more code than one. It's also the thing that lets the bot meet the visitor where they are: deep-evaluating the case study they just read, browsing for a quick answer, sharing the URL with a colleague. The shared /api/kev-o endpoint and the inline-punch-in page-context grounding are how I prevent that surface count from becoming three slightly different experiences. The cost shape says the same thing: BM25 is free, rerank costs a fraction of a cent, generation dominates. Optimizing retrieval further would be optimizing the wrong axis.

ask kev-o