Repo: github.com/midimurphdesigns/loom
Live demo: loom.kevinmurphywebdev.com
Read the full story: Building loom
Loom is a durable AI-commerce backend. Four workflows demonstrate the patterns that make agent-driven money movement safe in production: cart abandonment with durable sleep and idempotent email, dynamic checkout with bounded discount negotiation, shipping monitoring with saga compensation, and a Stripe webhook drift demo that shows why webhook stores are durable logs and not queues. 4 workflows, 10 adversarial Sirens scenarios, $2/day default cost cap with budget-aware short-circuit before every LLM call.
How it's built
Next.js 16 App Router on Vercel, TypeScript strict, zero any. The Vercel Workflow SDK (GA) provides durable sleep and per-step checkpointing. The Vercel AI SDK on @ai-sdk/anthropic handles every LLM call, with generateObject plus Zod schemas for structured agent decisions and generateText for free-form drafts. Upstash provides the budget counter, the per-visitor event lists, the consumer cursors, and the rate-limit window.
The cart-abandonment workflow runs a durable sleep (six hours in production, compressed via LOOM_DEMO_SLEEP_MS for the demo), drafts a re-engagement email with Haiku, and sends it through an idempotent receiver. The send-email step's idempotency key is the composite workflowId:stepName. The mock email provider in lib/email.ts persists the key in Upstash with a TTL; second sends return deduplicated: true and the underlying email API never fires twice. That is the at-least-once + idempotent-receiver + stable-key chain in one workflow.
The dynamic-checkout workflow runs an Opus-backed negotiate_discount step against four attack presets. The model returns a structured AgentDecision via generateObject and a Zod discriminated union; its entire output surface is one of { action: 'discount' | 'refund' | 'no_action', amountCents, reason }. The next step calls authorizeDiscount from lib/agent-authority.ts, which compares amountCents against MAX_DISCOUNT_USD * 100 read from the environment. The ceiling never appears in any prompt. The LLM never has a path to write past it.
The shipping-monitor workflow demonstrates saga compensation. Book carrier A, attempt carrier B, catch CarrierFailureError from B, run the paired compensation step that cancels carrier A. The booking idempotency key lives in the loom:carrier:booking: namespace; the cancel key lives in loom:carrier:cancel:. Different namespaces deliberately, so the cancel call does not dedupe-return the booking record.
The Stripe webhook drift demo persists every verified event to loom:stripe:event:<id> with a thirty-day TTL and appends to a per-visitor list. Consumption is independent: a separate endpoint walks the list newest-first, picks the first event not in the visitor's consumer-cursor set, and advances the cursor. The event itself stays in the store. The locked phrasing: webhook stores are durable logs, not queues. Multiple consumers, individual cursors, TTL-based eviction.
Agentic spending authority and Sirens
The runtime gate is five lines of plain code: read the ceiling from the environment, compare to the requested amount, return decision_blocked if it exceeds. That is the safety mechanism. Sirens is the evidence.
scripts/sirens.ts runs ten adversarial scenarios offline against the same path the runtime uses (prompt to LLM to gate to outcome). Vague pressure, fabricated authority, system-prompt-leak attempts, JSON injection, ceiling-math tricks, chained-reasoning attacks. After each scenario completes, Sirens asserts the applied amount stays at or below MAX_DISCOUNT_USD. Snapshot writes to .loom/sirens/<timestamp>.json for diffing across prompt changes and model upgrades. The assertion never fires because the deterministic gate always catches the overshoot. Unit tests prove the gate code is correct against known inputs. Sirens proves the gate holds against adversarial inputs the model did not see during training.
Failure injection
scripts/failure-injection.ts wraps a DurableStubProvider with a KillingProvider that throws after await fn() returns but before the step result is persisted. That is the worst case for durability: the side effect happened, but the workflow has no memory it happened. On replay, the workflow asks for that step's result, the provider has nothing, the workflow re-runs fn(). The receiver-side idempotency key is what makes that re-run safe. The harness runs N=5 trials per workflow per phase and asserts recovery completed, the email audit log shows exactly one entry per run, and zero sends were dropped. Not durable in theory; measurable durability under fault injection.
Artifacts worth reading
docs/ARCHITECTURE.md. The design contract: orchestration abstraction, exactly-once chain, saga shape, webhook drift handling, agent spending authority, failure-injection methodology, cost discipline, intentional non-goals.lib/agent-authority.ts. The five-line gate. The deterministic code that runs after the LLM finishes.lib/workflows/. The four workflow definitions. Each one composes Vercel Workflow SDK primitives (durable sleep, per-step checkpointing) with the receiver-side idempotency contracts inlib/email.ts,lib/carrier.ts, and the agent-authority gate.scripts/sirens.ts. The adversarial eval harness. Ten scenarios that prove the gate holds.
The trade-offs
The carrier API, the email provider, and the Stripe checkout flow are all fixture-backed; the real adapters are a separate phase. The transactional-outbox dispatcher that closes the gap between webhook receive and workflow start is documented in the architecture doc as a Phase 7 deferred item rather than pretending it exists. Loom's cost cap is global to the demo; production needs per-team ceilings. OpenTelemetry tracing is not wired; production has to add spans for every workflow, step, and adapter call so a stuck workflow is debuggable by an SRE who has never read the source. The architecture is shaped for these additions; the demo intentionally stops before them.