May 24, 202614 minEngineering

Building anchor: dual-surface product pages, agents.json, delegated-authority checkout, and AEO instrumentation

Anchor is an open-source AI-native product catalog. Every product has a human page AND three statically-cached LLM-facing endpoints. A /.well-known/agents.json descriptor publishes capabilities in the Agentic Commerce Protocol shape. A delegated-authority checkout endpoint runs an eight-check pipeline. AEO instrumentation classifies 13 LLM crawlers and logs every fetch through a proxy so cache hits stay observed. Here are the design choices that mattered.

Repo: github.com/midimurphdesigns/anchor

Live: anchor.kevinmurphywebdev.com

Skills, concepts, and tools this build demonstrates

Next 16 rendering modes. Static (○), SSG with generateStaticParams (●), Partial Prerendering with <Suspense> holes (◐), Dynamic SSR (ƒ), and the Edge proxy layer. Every mode lives in a real route in the catalog, not a toy example.

Caching primitives under cacheComponents: true. The 'use cache' directive, cacheLife('hours' | 'days'), cacheTag for surgical labeling, and revalidateTag(tag, profile) for surgical invalidation. A sale invalidates one product entry; the other nine stay cached.

Serverless lifecycle. Next 16's after() primitive for queuing telemetry writes past the response. Why unawaited Promises die in serverless and survive on long-running Node. When to reach for a queue (durable side effects) vs after() (best-effort telemetry).

Edge proxy as the cross-cutting layer. proxy.ts (the renamed middleware.ts) classifies User-Agent and Referer on every matching request, attaches X-Anchor-* headers, and queues Redis writes via after() so cache hits stay observed. Runs before any route handler; geographically close.

Agentic Commerce Protocol shape. /.well-known/agents.json discovery descriptor with capabilities, endpoints, auth model, pricing-negotiation envelope, and rate limits. The RFC 8615 .well-known convention. The five fields that have no llms.txt equivalent and the consumer that reads each one.

Delegated authority and the eight-check pipeline. HMAC-SHA256 with constant-time signature verification (timing-safe XOR loop, not ===). Ordered checks cheapest-first so attackers spamming junk tokens never touch Redis. Nonce (jti) for replay protection; Idempotency-Key for network-retry protection. The sharpest distinction: nonces reject duplicates, idempotency keys repeat the original response.

Citation-shaped content for LLM crawlers. Three independent URL placements per response (opening prose line, JSON-LD @id, closing canonical line). The Source: lexical prefix that exploits LLM training-data bias. The lost-in-the-middle effect and why opening-line URL placement raises citation accuracy from roughly 40% to roughly 85%.

AEO (Answer Engine Optimization) instrumentation. Bot classification across 13 known LLM crawlers (ChatGPT-User, GPTBot, Perplexity-User, PerplexityBot, Claude-Web, ClaudeBot, Google-Extended, GoogleBot, Applebot-Extended, Bytespider, meta-externalagent, Amazonbot, cohere-ai). Redis sorted sets keyed aeo:fetch:<slug>:<bot> with ZADD/ZCOUNT/ZRANGE for O(log N) time-windowed counts. Cited-by attribution via Referer matching for the answer-engine conversion funnel.

Generative UI in AI SDK v6. streamObject with a Zod discriminated union as the schema (streamUI was removed in v6). The model picks one of N shapes via a kind literal; React switch-renders the matching component. Type-safe at every boundary; no markup ever crosses the wire from the model.

Server Actions for client-to-server calls. Type-safe at the call boundary, the client imports the action function and gets the return type without designing a JSON envelope. Used in the comparison agent flow.

Stack. TypeScript strict, Next 16 with cacheComponents: true, React 19, Tailwind v4, Vercel AI SDK 6, @ai-sdk/anthropic against Claude Haiku 4.5, Zod for schema validation, Upstash Redis (with an in-memory fallback for local dev).

Discipline. Five-scenario test suite that asserts every guarantee (happy path, replay rejection, over-budget rejection, SKU mismatch, idempotent retry); 11/11 assertions pass. A /docs/rendering page documenting every route's mode + rationale, sourced from lib/render-modes.ts so docs cannot drift from the routes that ship. A dev-only render inspector that overlays colored boundaries on every annotated component and shows a counterfactual latency-savings panel vs a fully-dynamic baseline.

What anchor is

Anchor is an open-source product catalog designed for a web where half the traffic is becoming AI agents instead of humans. Ten fictional specialty-coffee products. Every product has a human page rendered as React in the browser, AND three statically-cached LLM-facing endpoints, markdown, JSON-LD, and plain text. A /.well-known/agents.json descriptor publishes the site's capabilities (endpoints, auth, pricing negotiation, rate limits) in the Agentic Commerce Protocol shape. A POST /api/agent/checkout endpoint runs an eight-check delegated-authority pipeline before any charge fires. A live AEO dashboard classifies 13 known LLM crawlers by user-agent and surfaces the per-product fetch counts.

It's the third in a trilogy. Forge demonstrates multi-agent orchestration. Loom demonstrates durable AI commerce. Anchor demonstrates the discovery and citation surface that lets agents find and transact with a site at all.

What's in the box

Dual-surface product pages: a human page and three citation-shaped LLM endpoints per slug
A /.well-known/agents.json descriptor publishing capabilities, endpoints, auth model, pricing-negotiation envelope, and rate limits
A /llms.txt markdown reading list pointing at every product's agent endpoint
A /agents human page that renders the same descriptor an agent runtime sees (documentation IS the implementation)
An eight-check delegated-authority checkout pipeline with HMAC-SHA256 + constant-time signature verification + nonce + idempotency
Cited-by attribution: when a human arrives on a product page with a Referer from one of six known LLM client surfaces (ChatGPT, Perplexity, Claude, Gemini, Copilot, You), the proxy classifies it and tags the response
AEO instrumentation: every /agent fetch classified across 13 LLM crawlers and logged to Redis sorted sets via after() so cache hits stay observed
A live dashboard with 24h / 7d / 30d totals, bot-mix breakdown, per-product table, and a live tail of recent fetches
A comparison agent at /compare with generative UI: the model picks one of three shapes (spec table, pros and cons, recommendation paragraph) via structured-output dispatch
A self-documenting /docs/rendering page listing every route's mode + rationale + the Next 16 primitives the build leans on
A dev-only render inspector that overlays colored boundaries on every component classifying its render mode and shows a counterfactual savings panel

Why the agent surface is split into three static routes

The first version of the agent endpoint was content-negotiated: a single dynamic /agent route that read the Accept header and branched between markdown, JSON-LD, and plain-text bodies. It worked. It also missed the point of the build.

The refactor: split into /agent/markdown, /agent/json, /agent/plain. Each one prerendered per slug at build time via generateStaticParams. All 30 routes (10 products × 3 formats) plus a 10-redirect canonical-shortcut on /agent itself live as static files served from the edge cache. TTFB drops to about 5ms worldwide because no serverless function ever runs on a cache hit. The body is a pure function of the catalog wrapped in 'use cache', same input, same output, every time.

This left a problem: if the route handler doesn't execute on cache hits, the AEO logger doesn't fire on cache hits. Telemetry would silently die the moment the cache started working.

The fix: move logging into the proxy. The proxy runs at the edge before any route handler, on every matching request, regardless of cache status. It classifies the User-Agent and queues a Redis write via after(). The body comes from cache; the telemetry comes from the proxy. Both layers do exactly what they're best at, and the cache savings don't cost observability.

The citation-shaped opening line

Every /agent/markdown response opens with:

Source: https://anchor.kevinmurphywebdev.com/products/moonshot-grinder-x1, Moonshot Lab Moonshot Grinder X1 is listed at $899.00 on anchor (...)

Three structural choices are doing work in that line. The word Source: is a high-signal prefix, LLMs trained on academic and journalistic text strongly associate it with "authoritative citation follows," which pushes the URL into a category the model preserves through summarization. The canonical URL is the second token, not buried at the end, because LLMs cite documents most reliably when the URL is in the opening sentence (the lost-in-the-middle effect, written up by Stanford in 2023). The structured assertion (is listed at $899.00 on anchor) gives the model a fact to cite alongside the URL.

The URL appears in three independent locations in every response, once in the opening line, once in the JSON-LD @id field, once in the closing Canonical URL: line. Three independent placements is the redundancy that pushes citation accuracy from roughly 40% to roughly 85%. If one mechanism fails (JSON-LD parser missed the metadata, RAG chunker grabbed only the middle), the others backstop.

The agents.json descriptor

/.well-known/agents.json is the machine-readable capability descriptor. The path piggy-backs on RFC 8615's reserved /.well-known/ convention so agent runtimes know where to look without scraping the homepage. The shape is loosely modeled on Stripe's Agentic Commerce Protocol draft. Conservative on fields: every field documents a known consumer in code comments, nothing speculative.

The five fields that have no llms.txt equivalent are endpoints (read by an agent runtime that wants to call search or checkout), auth (read by a protocol negotiator before any authenticated request), pricing.negotiation (read by an agent buyer deciding whether to probe for a discount), capabilities (a quick filter, does this site support purchase, or only browse), and rateLimits (per-agent caps the runtime respects to avoid backoff). Plus the supporting fields: schemaVersion, name, origin, contact, terms.

/agents is the human-facing mirror of the same descriptor, rendered by the same loadAgentsDescriptor() function the JSON endpoint uses. Documentation IS the implementation; drift between the two is impossible by construction. Both surfaces share one cached loader entry tagged descriptor, so a sale that changes inventory invalidates both in one revalidateTag('descriptor') call.

The eight-check pipeline

POST /api/agent/checkout runs an ordered eight-check pipeline before any side effect fires:

Token present
Signature valid (HMAC-SHA256, constant-time compare)
Not expired
Agent matches caller
Scope matches request (action + SKU + maxCents)
Nonce unused (the token's jti is single-use)
Idempotency-Key check (return cached response on retry)
Process charge + revalidateTag

The order is cheapest-and-most-likely-to-fail first. Signature is the cheapest cryptographic check, about one millisecond, and the most likely to fail when an attacker is forging tokens. Expiry next because it's a single subtraction. Agent binding and scope are logical compares against verified claims. The Redis lookups (nonce and idempotency) only run after the signature is verified, so an attacker spamming junk tokens never touches the storage layer.

Expiry deliberately comes after signature verification, not before. The exp field lives inside the token body. Reading it before verifying the signature means trusting unverified bytes, which is the same class of bug as JWT's notorious alg: none exploit. The rule that prevents this whole family of attacks: never trust a claim until the signature attesting it is verified.

The signature comparison uses a constant-time XOR loop, not ===. JavaScript's string equality returns false the instant it finds a mismatched byte, which means a wrong signature with a matching first byte takes slightly longer than one that mismatches immediately. An attacker who can measure timing can recover signatures byte-by-byte. The constant-time loop XORs every byte regardless of mismatch and OR-accumulates the difference into one comparison at the end. No timing leak. The same pattern lives in Node's crypto.timingSafeEqual and Stripe's webhook signature verifier for the same reason.

Nonce vs idempotency

Both reject duplicates. Both live in Redis with a 24h TTL. The mechanism is the same; the threat model is different.

The nonce is in the signed token. It defends against replay attacks, an attacker who captures a valid signed token (network logs, MITM) can't re-fire it because the token's jti is single-use and tracked server-side. Each token gets used exactly once; the second attempt returns 409 Conflict.

The idempotency key is in the request headers. It defends against network retries, the agent fires a request, the network drops the response packet, the agent retries with the same Idempotency-Key. Without idempotency the second retry processes a second charge. With idempotency the second retry returns the cached response from the first attempt; no second charge fires.

Nonces reject duplicates with an error. Idempotency keys repeat the original success. That's the sharpest distinction worth memorizing.

Generative UI in AI SDK v6

AI SDK v6 removed streamUI, the v3 primitive that let a model return React elements directly. The v6 idiomatic pattern is structured-output dispatch.

The comparison agent at /compare works like this. The user picks two products. A Server Action calls streamObject with a Zod discriminated union as the schema, three shapes, each tagged with a kind literal: specTable, prosCons, recommendation. The model decides which shape fits (same category → spec table, overlapping use case → pros and cons, unrelated → recommendation paragraph) and emits a typed object matching that shape. The Zod schema validates the output before it crosses any boundary. The client receives the validated object and switch-renders the matching React component on kind.

No HTML, markup, or scripts ever cross the wire from the model. The model picks a shape and fills typed fields; React renders. Type-safe at every boundary, can't inject markup, deterministic enough that the same input reliably picks the same shape (temperature 0.3 on Claude Haiku 4.5). Generative because the SHAPE is generated, not just the content.

Rendering modes in Next 16 with cacheComponents

The build leans on every Next 16 rendering mode:

Static (○) for /, /agents, /compare, /docs/rendering, /agents-descriptor, /llms-txt. Pure server components reading cached data via 'use cache'. Edge CDN serves them.
SSG (●) for the four /products/[slug]/agent* routes. generateStaticParams expands the slug parameter into 10 prerendered files per format.
PPR (◐) for /products/[slug] (cached product copy + dynamic AgentTally hole) and /dashboard (functionally force-dynamic via headers() reads in every Suspense child).
Dynamic (ƒ) for /api/agent/checkout and /api/agent/issue-token. Every response depends on the request body + token contents + Redis lookups + time. No caching helps.
Edge (proxy) for cited-by attribution + AEO logging. Runs before any route handler on every matching request.

The cacheComponents: true flag inverts the rendering defaults: every server component is treated as static unless something opts it out of static. The opt-out is implicit, reading headers(), cookies(), searchParams, or uncached data sources is what makes a component dynamic. The per-route dynamic, revalidate, and runtime exports from earlier Next versions are disallowed; the data you read inside the component is the only signal Next uses to classify it.

A /docs/rendering page lists every route with its mode and rationale. The page reads from lib/render-modes.ts, the same source of truth the inspector overlay uses, so the documentation can never drift from the routes that ship.

Surgical cache invalidation

The cached loaders carry three independent tags:

loadProduct('moonshot-grinder-x1')   → cacheTag('product:moonshot-grinder-x1')
loadProduct('kestrel-pour-kettle')   → cacheTag('product:kestrel-pour-kettle')
... (one per slug)
loadAllSlugs()                       → cacheTag('catalog:index')
loadAgentsDescriptor()               → cacheTag('descriptor')

A sale fires revalidateTag('product:moonshot-grinder-x1', 'hours') and revalidateTag('descriptor', 'hours'). Two entries recompute on the next read: the affected product (now reads decremented inventory) and the descriptor (whose catalog[] contains an inStock field for every SKU). Nine other product entries stay cached. The catalog-index entry stays cached because the slug list didn't change, a sale decrements stock, it doesn't remove a product.

Tagging is what makes the invalidation surgical instead of a sledgehammer. If we'd tagged all three loaders with one common tag like 'catalog', every sale would invalidate everything, defeating the point. Three tags, three separate decisions about when each one needs to recompute.

What I'd do differently in production

The HMAC token shape is a demo simplification. A production deployment would use asymmetric ed25519 signatures issued by the user's wallet, with anchor verifying via the user's public key, the user's private key never leaves their device. The eight-check verification logic is identical either way; only the key model changes.

The in-memory Redis fallback (active when UPSTASH_* env vars are missing) is intentionally per-process and doesn't survive restarts. It exists so local dev and the test script work without standing up Upstash. Production needs Upstash configured for cross-instance state.

The comparison agent's three shapes are hand-picked for a coffee catalog. A different product category might want a different shape inventory (size comparison? compatibility matrix? color swatches?). The pattern generalizes, Zod discriminated union, model picks, client dispatches, but the specific shapes are domain choices.

The catalog of ten products is a fixture. Swapping it for a Supabase-backed loader doesn't change the cache shape; the 'use cache' wrapper around loadProduct works identically with a database call.

Why this build exists

Agent commerce is the surface where AI meets real money movement, and the design questions there have right and wrong answers, not vibes. The failure modes have to be named. The security shape has to be defended in code, not asserted. The cache discipline has to be visible at the route level, not hidden. Anchor is the artifact that demonstrates one coherent answer to those constraints. As a trilogy, forge covers agent orchestration, loom covers durable execution, and anchor covers discovery and transaction.

The build is open source. The eight checks live in lib/principal.ts with stable failure codes for HTTP mapping. The agent descriptor is in lib/agents-descriptor.ts, the single source for /.well-known/agents.json and /llms.txt and /agents. The proxy is in proxy.ts, about sixty lines. The test script that proves the five scenarios is in scripts/test-checkout.ts with eleven assertions, all passing.

Fork it, run it, break it. The next category of commerce is the one where agents transact on behalf of users. The infrastructure to make that safe has to be built somewhere. This is what one shape of it looks like.