demos10 live

Try the work.

Hosted artifacts you can run in the browser. Open source, MIT licensed, no signup or keys on the demos themselves.

Featured3 essential

Durable AI commerce backend
loom
Four workflows that demonstrate the patterns that make agent-driven money movement safe in production: durable sleep, exactly-once side effects, saga compensation, webhook drift reconciliation, agentic spending authority bounded by a deterministic gate.
Built on Vercel Workflows for durable execution. Cart abandonment runs a durable sleep, drafts a re-engagement email with Haiku, sends through an idempotent receiver — the same idempotency key returns deduplicated:true so duplicates never reach the customer. Dynamic checkout calls Opus for a structured AgentDecision via generateObject and Zod, then a five-line deterministic gate compares the requested amount against an env-driven ceiling before applying anything. Shipping monitor runs saga compensation with namespaced cancel keys. Stripe webhook drift demo shows receive-then-consume-later with a visitor's consumer cursor — events stay in the store, the cursor advances, dimmed events visualize the multi-consumer pattern. A Sirens eval harness runs 10 adversarial scenarios in CI. A failure-injection harness proves zero duplicate sends under fault. Hardened with per-IP rate limit, daily USD cap, and per-visitor cookie scoping.
TypeScriptNext.js 16Vercel Workflow SDKVercel AI SDKAnthropic SDKStripeZodUpstash
loom.kevinmurphywebdev.com →Source on GitHub →Read the post →
Multi-agent debugging concierge
forge
Paste a stack trace. Four specialist subagents fan out in parallel, then a coordinator merges their structured findings into ranked, calibration-weighted hypotheses.
Parallel fan-out via Promise.all plus pLimit bounded concurrency. Two-pass agent pattern (generateText then generateObject) for reliable structured output from a tool-using loop. Durable resumable session state, preemptive abort plumbed end-to-end via AbortSignal with cross-instance signaling through Upstash, a Brier-score calibration log that downweights overconfident lanes at merge time, a five-scenario CLI eval harness with graded IoU rubric and n-runs aggregation, Anthropic prompt-caching breakpoints with honest below-threshold disclosure. Hardened with per-IP rate limit and daily USD spend cap.
TypeScriptNext.js 16Vercel AI SDKAnthropic SDKp-limitZodUpstashTailwind v4
forge.kevinmurphywebdev.com →Source on GitHub →Read the post →
AI-native product catalog
anchor
Every product has a human page and three statically-cached LLM-facing endpoints. The /.well-known/agents.json descriptor publishes endpoints + auth + pricing negotiation. Agents transact through an eight-check delegated-authority pipeline.
Ten fictional specialty-coffee products. The agent surface splits into /agent/markdown (citation opening + JSON-LD fence + prose), /agent/json (Schema.org Product/Offer only), and /agent/plain (prose only). All 30 routes prerendered at build, served from the edge cache in ~5ms. The proxy layer classifies User-Agent on every fetch and queues a Redis write via after() so cache hits stay instrumented. The /api/agent/checkout endpoint runs eight ordered checks (signature, expiry, agent binding, scope, nonce, idempotency, inventory, charge) and revalidateTag invalidates only the affected product entry on a sale, leaving the other nine cached. A live AEO dashboard surfaces 24h / 7d / 30d totals, bot-mix breakdown, per-product counts, and a live tail. A /docs/rendering page documents every route's render mode so the build is self-documenting. A dev-only render inspector overlays colored boundaries on every component classifying its mode and shows a counterfactual savings panel vs a fully-dynamic baseline.
TypeScriptNext.js 16React 19Vercel AI SDKAnthropic SDKZodUpstashTailwind v4
anchor.kevinmurphywebdev.com →Source on GitHub →Read the post →

More demos7 live

LangGraph triage agent
issuegraph
Watch a LangGraph state machine triage GitHub issues live: classify, route, quality-guard, and pause for human approval when its own confidence is low.
The page runs the real graph server-side and streams each node to the browser as it executes. Classification comes from a LangChain chain with Zod-structured output including a confidence score. Conditional edges route to one of four specialist drafters, an LLM guard loops weak drafts back for a bounded redraft, and a confidence gate calls interrupt() when the classifier is unsure, checkpointing to Redis and waiting for you to approve or reject. Evals run a labeled golden set through the graph with LangSmith, scoring category accuracy deterministically and draft quality with an LLM judge, then check confidence honesty with a Brier score and reliability table. Preset-only input, per-IP rate limits, and a daily budget cap the spend.
TypeScriptNext.js 16LangGraphLangChainLangSmithAnthropic SDKZodUpstashRedis
issuegraph.kevinmurphywebdev.com →Source on GitHub →Read the post →
In-browser data agent
tablesalt
Drop a CSV, ask a question, the agent renders the answer as the right kind of UI: a chart, a stat card, a table, or a list.
DuckDB-WASM runs every query in the browser so the file never uploads. The agent streams a four-step reasoning trace before the SQL lands. A live eval scoreboard runs 12 hand-labelled questions against the model on demand and shows per-case accuracy and dollar cost.
Next.js 16React 19TypeScriptTailwind v4Vercel AI SDKVercel AI GatewayDuckDB-WASMstreamfield
tablesalt.kevinmurphywebdev.com →Source on GitHub →Read the post →
npm library
streamfield
A tiny React library for displaying a structured AI response as it's still being generated, without the fields flickering and snapping into place.
Diffs successive partial-object snapshots from the Vercel AI SDK's streamObject and exposes a pending / streaming / complete state per field. About 150 lines of source. The playground renders the same partial with and without the library so you can see the difference.
TypeScriptReacttsupVercel AI SDKVercelNext.js 16
streamfield.kevinmurphywebdev.com →Source on GitHub →Read the post →
Also on npm: npm install streamfield
LLM eval harness
fedbench
An open-source evaluation harness that measures hallucination, citation accuracy, and refusal discipline in grounded Q&A.
BM25 retrieval over two side-by-side public corpora (Medicare + OSHA, 21 verified Q&A pairs), Sonnet-4.6 grounded answers, deterministic citation check, Opus-4.7 LLM-as-judge. Demo is a replay viewer — explore every pair with retrieved chunks, the agent's answer, and the judge verdict.
TypeScriptBunAnthropic SDKMCPBM25LLM-as-judge
fedbench.kevinmurphywebdev.com →Source on GitHub →Read the post →
MCP server / agent tool design
fieldops-mcp
A Model Context Protocol server that exposes a small-business field-services dispatcher workflow as six agent tools.
Six tools, six structurally distinct shapes — read, search, mutation with typed errors, composition, aggregation, and human escalation. The interesting work is choosing what tools to expose, not the plumbing.
TypeScriptBunMCP TS SDKZodClaude Desktop
fieldops-mcp.kevinmurphywebdev.com →Source on GitHub →Read the post →
A showcase, not a live MCP host — interactive tour of real captured exchanges. MCP itself runs client-side inside an LLM host; the page explains why.
Sub-agent orchestration
grant-pilot
A multi-turn agent that helps a small business or nonprofit find federal grants and drafts an application skeleton.
A planner dispatches three specialist sub-agents (discovery, eligibility, drafter) over live federal grants and entity-registration data. Per-sub-agent fallback ladder, structured-failure routing, hosted with a 5-intent allowlist, daily budget cap, and per-IP rate limit.
TypeScriptBunAnthropic SDKNext.js 16VercelUpstash
grant-pilot.kevinmurphywebdev.com →Source on GitHub →Read the post →
Grounded RAG chatbot
kev-o
Ask anything about my work. He cites his receipts, and he won't make stuff up.
Grounded chatbot trained exclusively on my public corpus: blog posts, project case studies, resume, and the READMEs of my open-source repos. Hybrid retrieval (BM25 candidate pool plus Voyage cross-encoder rerank) feeds Claude Sonnet 4.6 streamed through the Vercel AI SDK. Three surfaces share one brain: this standalone subdomain, a global Command-K palette on every page, and inline punch-ins at the foot of curated entries. Hardened with per-IP rate limit, daily USD spend cap, and an owner-bypass route with timing-safe key comparison.
TypeScriptNext.js 16Vercel AI SDKAnthropic SDKVoyage rerankBM25UpstashTailwind v4
kev-o.kevinmurphywebdev.com →Source on GitHub →Read the post →
Also embedded on every page of this site (try Cmd-K) and at the foot of every blog post and case study.

loom

forge

anchor

issuegraph

tablesalt

streamfield

fedbench

fieldops-mcp

grant-pilot

kev-o