LLM-authored skills + a classical orbital classifier โ why we route this way
Most AI skill-routing systems in 2026 do one of two things. Either they retrieve from a static curated corpus (vector search โ top-K โ present), or they run the whole task through an LLM end-to-end and trust the model's ranking.
Meridian does neither. We have Llama-3.3-70B author fresh skills per task, then run them through a deterministic JS classifier that assigns each one a celestial class โ planet, moon, trojan, asteroid, comet, or irregular โ based on physics-style features derived from the skill body.
This sounds quirky. Here's why we did it.
The static-corpus problem
Static-corpus RAG works well when the corpus is rich and the user query falls inside its known territory. It fails when:
- The corpus has gaps. Every "we should add a skill for X" is a backlog item that slows your iteration.
- The corpus drifts stale. Your "best practices for Y" skill from 2024 doesn't cover the 2026 patterns.
- The user's wording diverges from the curator's. Now you need synonyms, query rewriting, hybrid lexical + semantic search, etc.
For an open-domain skill router โ where any developer might ask anything โ a static corpus is a perpetual maintenance burden. So we don't have one. Every result is generated on demand.
The end-to-end LLM problem
The opposite extreme: skip retrieval, just give the task to a smart model and ask it to "list 5 relevant skills." This is what most "AI agent" demos do.
It has its own failure mode: nondeterminism. Run the same query twice, get different ranks. Tweak temperature, get different orderings. There's no signal that ties the result to anything verifiable.
What Meridian does instead
The pipeline:
- LLM authors candidates. Llama-3.3-70B generates 5-8 fresh skill specs per task โ full markdown bodies, ~600 chars each, with named tools, decision rules, anti-patterns. A 1-shot exemplar in the prompt anchors the format.
- Classical classifier scores them. A pure-JS function derives physics-style features from each skill body: mass, scope, independence, cross-domain, fragmentation, drag, dep_ratio. It picks an argmax over six per-class scores and assigns a celestial class.
- Lexical + semantic ranking. Token-overlap routing score against the user task, multiplied by a class-specific boost (planets get more weight than asteroids), modulated by Lagrange-point versatility. Optionally re-ranked by cosine similarity from bge-m3 embeddings.
- Top-K returned. Each with its full SKILL.md body, classification, and decision rule.
The properties this gives us:
- Fresh per task. No corpus to maintain.
- Deterministic ranking. Given the same generated set, the classifier always produces the same order.
- Explainable. Every result includes the physics features used to classify it. You can show users why a skill outranked another.
- Composable. The orbital metaphor makes it natural to talk about skill relationships ("this trojan orbits that planet"). It's a UI/UX primitive, not just a scoring function.
Why "celestial" categories at all
Honestly: because it's a memorable taxonomy. Six names that map to six functional roles a skill can play in a task:
- Planet โ the central skill, broad scope, anchors the task
- Moon โ orbits a planet, narrower, supports it
- Trojan โ sits at a Lagrange point between two planets, bridges domains
- Asteroid โ small, focused, often anti-pattern or failure-mode skill
- Comet โ eccentric orbit, high cross-domain affinity, useful rarely but powerfully
- Irregular โ doesn't fit; useful as a flag
The naming is aesthetic but the partition is functional โ the classifier picks based on real features, not vibes. Each skill gets a class, and the UI presents them as orbiting bodies (literally โ there's an interactive WebGL galaxy view at /miniapp/).
Where this breaks down
The honest cases where Meridian's approach is worse than alternatives:
- Latency. Generating 5-8 skill bodies via Llama-3.3-70B takes 5-30 seconds. Static-corpus RAG returns in 50ms. We mitigate with KV caching (24h TTL) + AI Gateway dedup, but cold queries are slow.
- Cost. Each fresh-cache call burns 2-3k LLM tokens. At Workers AI free tier this is fine; at scale we'd need to push more aggressively toward the cache + bge-m3 pre-filter.
- Quality variance. The LLM occasionally writes garbage skills. The classifier filters by slug pattern + minimum body length but can't filter for "is this actually useful." Pre-Pareto launch we hand-curated; now we trust the model + accept some noise.
Where it goes next
The current classifier is heuristic โ every "physics signature" is a JS function we hand-tuned. The natural next step is making the physics actually do work: replacing those heuristics with a real n-body simulation where each skill is a body in an embedding-projected space, with mass derived from semantic weight, and Lagrange points as actual cluster attractors that the solver finds.
That's the bet our research repo qrouter is exploring โ quantum natural-language retrieval via DisCoCat tensor diagrams + Born-rule overlap. Live demo at qrouter.ask-meridian.uk.
Want to ship this kind of architecture for your own MCP service?
The full guide ships every line of code that powers Meridian's routing pipeline: Worker setup, KV cache, AI Gateway, Vectorize semantic re-rank, SSE streaming. $29 on Gumroad.
Build Your Own MCP Server โ $29 โ