2026-05-04 · 8 min read

Building a production MCP server on Cloudflare Workers (with auth + Stripe)

Most "build an MCP server" tutorials in May 2026 show you npx create-mcp and a Hello, world tool. That works for a demo. It does not work for anything you'd expose to the public internet, because the moment your server can do anything useful — call an LLM, query a database, hit a paid API — random people will drain your bill.

A production MCP server in 2026 needs five things the templates skip: authentication, per-user quotas, billing, observability, and edge deployment. This is the architecture we use for ask-meridian.uk — a live MCP server in the official MCP Registry — and it costs $0/month in hosting until you cross 100k requests/day.

The architecture in one diagram

   Claude Code / Cursor / Cline / Windsurf
        │ stdio JSON-RPC
        ▼
   ┌──────────────────────────────────────┐
   │ npm-installable shim                  │  ~5 KB JavaScript on the user's
   │ (forwards every call over HTTPS)      │  laptop, fetched via npx -y
   └────────────┬─────────────────────────┘
                │ HTTPS POST + Authorization: Bearer …
                ▼
   ┌──────────────────────────────────────┐
   │ Cloudflare Worker / Pages Function    │  validates bearer, checks
   │                                       │  quota, calls upstream LLM
   └────┬──────────────────┬──────────────┘
        ▼                  ▼
   Cloudflare KV      Stripe (Checkout +
   key:HASH           Customer Portal +
   monthly:HASH:YM    webhooks → /api/stripe/webhook)
   free:IP:DAY

Three boxes, all free or near-free. Cloudflare Workers free tier covers 100k requests/day. KV covers 100k reads/day + 1k writes/day. Stripe is 2.9% + 30¢ per successful charge — no fixed monthly cost, no per-API-call cost.

Why a fat backend + thin shim

The pattern that wins in 2026 is what we call fat backend + thin shim. The user installs a small npm package locally; that package speaks MCP JSON-RPC on stdio (because that's what their client wants) and forwards every tools/call to your real backend over HTTPS.

Three reasons this beats running everything locally on the user's machine:

Zero secrets on the user's disk. API keys for Anthropic, Groq, etc. live on your server, never in the user's environment.
Instant updates. When you ship a new tool or fix a bug, you redeploy the Worker — every user gets the new version without doing anything.
Centralized observability. Every call goes through your backend. You see usage, you can rate-limit, you can ship feature flags.

The auth flow

Stripe → Webhook → KV. When a user pays, Stripe's webhook hits /api/stripe/webhook. We generate a 32-byte random key (mrd_live_…), store its SHA-256 hash in KV under key:HASH, and stash the plain-text key under session:STRIPE_SESSION_ID with a 30-minute TTL.

The user lands on /api/stripe/claim?session_id=… after checkout. We pull the plain-text key once, delete the session entry, and show the key in an HTML page. They save it (we tell them they only see it once). From then on, every request carries Authorization: Bearer mrd_live_… and we look up the SHA-256 in KV.

Why hash? Because if your KV ever gets dumped — insider threat, misconfiguration, SDK bug — the leaked data shouldn't include working credentials. You only ever store the SHA-256.

Per-IP rate limiting on the free tier

For anonymous traffic (no bearer header), we cap at 5-10 requests/day per IP. The pattern is one read + one write to KV per request:

const ip   = request.headers.get('cf-connecting-ip') || 'unknown'
const dkey = `free:${ip}:${new Date().toISOString().slice(0, 10)}`
const used = parseInt((await env.MCP_KV.get(dkey)) || '0', 10)
if (used >= LIMIT) return json({ error: 'free tier exhausted' }, { status: 429 })
await env.MCP_KV.put(dkey, String(used + 1), { expirationTtl: 90000 })

KV doesn't support atomic increments. At 5/day per IP that doesn't matter — racing under-counts by 1 occasionally is fine. At 5/sec it would matter; for that you'd use Durable Objects.

One non-obvious gotcha: residential IPv6. ISPs hand out /64 prefixes; the host bits change every few hours. Per-IPv6-address counting under-counts. The fix is to compute the /64 prefix and use that as the rate-limit key. We do this in _ip.js — about 30 lines.

The cost-of-goods problem

You're charging $29/mo. Your real cost is the LLM API call you make for the user. If a user can run up $50 of LLM cost on their $29 plan, you lose money. The whole game is keeping COGS < revenue per user.

Three levers worth pulling, in order:

Cache aggressively. Same task → same response, 99% of the time. Hash the input + parameters, cache the LLM output for 24 hours. Repeat queries cost you ~$0.0001 instead of $0.005.
Use Cloudflare AI Gateway. Free observability + caching layer in front of any LLM provider. Set cache_ttl=86400 and identical upstream requests dedupe at the gateway, even when your KV cache misses.
Route to the cheapest provider that meets the quality bar. Workers AI for the bulk (free tier covers a lot), Groq for "needs to be fast" (~$0.0005/call), Anthropic Sonnet only when the task actually needs it (~$0.012/call).

SSE streaming to make 30-second waits feel like 5

An LLM call to Anthropic Sonnet on a complex task can take 20-30 seconds. Users stare at a spinner and abort at 8 seconds. The fix is Server-Sent Events: as the LLM streams tokens to your Worker, you stream progress events back to the user.

The wire format is simple: event: progress\ndata: {...}\n\n. The hard parts are (1) parsing OpenAI-format SSE chunks from the upstream, (2) throttling progress events so you don't flood the client, (3) gracefully handling cancellation when the user closes their tab. About 80 lines of TypeScript total.

Want every line of code, working, in your hands today?

The full step-by-step guide ships the complete Worker (~400 lines), the npm-publishable stdio shim (~150 lines), the Stripe webhook handler with signature verification, the SSE streaming pipeline, the AI Gateway wiring, and a copy-paste MCP Registry submission — plus the exact production gotchas in chapter 9 that you'd otherwise hit live.

Get the guide — $29 →

What we got wrong (so you don't)

Three things that bit us building Meridian, all of which the guide covers:

KV propagation latency. Brand-new buyers got 401s on their first protected call because the key:HASH entry hadn't propagated yet from the webhook handler's edge to the user's edge. Fix: in the claim endpoint, set the entry with explicit cacheTtl on the local edge.
Webhook idempotency. Stripe retries failed webhooks for up to 3 days. If your handler creates a key on every retry, you have 4 keys for the same customer. Always dedupe by the event's id.
The 1 MB Worker bundle limit. The official stripe npm package is 700 KB. openai SDK is 400 KB. Together you're over. Solution: 25 lines of raw fetch() against Stripe's REST API does everything we need.

The end state

A live MCP server that takes real money via Stripe, has per-user quotas, runs on Cloudflare's free tier, and is listed in the official MCP Registry. Total infrastructure cost at ~50 paying users: $0/month. Revenue: $1,450/month.

Meridian itself is the proof — try the live demo at ask-meridian.uk, browse the source at github.com/LuuOW/meridian-mcp. The patterns above are exactly what's in the public repo.

Ship yours in 30 minutes →

Working CF Worker template + stdio shim + 60-page guide for $29. MIT-licensed code, commercial use OK.

Build Your Own MCP Server — $29 →