2026-05-04 · 6 min read

LLM-authored skills + a classical orbital classifier — why we route this way

Most AI skill-routing systems in 2026 do one of two things. Either they retrieve from a static curated corpus (vector search → top-K → present), or they run the whole task through an LLM end-to-end and trust the model's ranking.

Meridian does neither. We have Llama-3.3-70B author fresh skills per task, then run them through a deterministic JS classifier that assigns each one a celestial class — planet, moon, trojan, asteroid, comet, or irregular — based on physics-style features derived from the skill body.

This sounds quirky. Here's why we did it.

The static-corpus problem

Static-corpus RAG works well when the corpus is rich and the user query falls inside its known territory. It fails when:

The corpus has gaps. Every "we should add a skill for X" is a backlog item that slows your iteration.
The corpus drifts stale. Your "best practices for Y" skill from 2024 doesn't cover the 2026 patterns.
The user's wording diverges from the curator's. Now you need synonyms, query rewriting, hybrid lexical + semantic search, etc.

For an open-domain skill router — where any developer might ask anything — a static corpus is a perpetual maintenance burden. So we don't have one. Every result is generated on demand.

The end-to-end LLM problem

The opposite extreme: skip retrieval, just give the task to a smart model and ask it to "list 5 relevant skills." This is what most "AI agent" demos do.

It has its own failure mode: nondeterminism. Run the same query twice, get different ranks. Tweak temperature, get different orderings. There's no signal that ties the result to anything verifiable.

What Meridian does instead

The pipeline:

LLM authors candidates. Llama-3.3-70B generates 5-8 fresh skill specs per task — full markdown bodies, ~600 chars each, with named tools, decision rules, anti-patterns. A 1-shot exemplar in the prompt anchors the format.
Classical classifier scores them. A pure-JS function derives physics-style features from each skill body: mass, scope, independence, cross-domain, fragmentation, drag, dep_ratio. It picks an argmax over six per-class scores and assigns a celestial class.
Lexical + semantic ranking. Token-overlap routing score against the user task, multiplied by a class-specific boost (planets get more weight than asteroids), modulated by Lagrange-point versatility. Optionally re-ranked by cosine similarity from bge-m3 embeddings.
Top-K returned. Each with its full SKILL.md body, classification, and decision rule.

The properties this gives us:

Fresh per task. No corpus to maintain.
Deterministic ranking. Given the same generated set, the classifier always produces the same order.
Explainable. Every result includes the physics features used to classify it. You can show users why a skill outranked another.
Composable. The orbital metaphor makes it natural to talk about skill relationships ("this trojan orbits that planet"). It's a UI/UX primitive, not just a scoring function.

Why "celestial" categories at all

Honestly: because it's a memorable taxonomy. Six names that map to six functional roles a skill can play in a task:

Planet — the central skill, broad scope, anchors the task
Moon — orbits a planet, narrower, supports it
Trojan — sits at a Lagrange point between two planets, bridges domains
Asteroid — small, focused, often anti-pattern or failure-mode skill
Comet — eccentric orbit, high cross-domain affinity, useful rarely but powerfully
Irregular — doesn't fit; useful as a flag

The naming is aesthetic but the partition is functional — the classifier picks based on real features, not vibes. Each skill gets a class, and the UI presents them as orbiting bodies (literally — there's an interactive WebGL galaxy view at /miniapp/).

Where this breaks down

The honest cases where Meridian's approach is worse than alternatives:

Latency. Generating 5-8 skill bodies via Llama-3.3-70B takes 5-30 seconds. Static-corpus RAG returns in 50ms. We mitigate with KV caching (24h TTL) + AI Gateway dedup, but cold queries are slow.
Cost. Each fresh-cache call burns 2-3k LLM tokens. At Workers AI free tier this is fine; at scale we'd need to push more aggressively toward the cache + bge-m3 pre-filter.
Quality variance. The LLM occasionally writes garbage skills. The classifier filters by slug pattern + minimum body length but can't filter for "is this actually useful." Pre-Pareto launch we hand-curated; now we trust the model + accept some noise.

Where it goes next

The current classifier is heuristic — every "physics signature" is a JS function we hand-tuned. The natural next step is making the physics actually do work: replacing those heuristics with a real n-body simulation where each skill is a body in an embedding-projected space, with mass derived from semantic weight, and Lagrange points as actual cluster attractors that the solver finds.

That's the bet our research repo qrouter is exploring — quantum natural-language retrieval via DisCoCat tensor diagrams + Born-rule overlap. Live demo at qrouter.ask-meridian.uk.

Want to ship this kind of architecture for your own MCP service?

The full guide ships every line of code that powers Meridian's routing pipeline: Worker setup, KV cache, AI Gateway, Vectorize semantic re-rank, SSE streaming. $29 on Gumroad.

Build Your Own MCP Server — $29 →