Try-Works

An index of projects, products, and ideas.

Projects & Products

rlm-workflow
tinytunes DJ
lurkkit
role-model
Pocketmodel

Writing

recursive-mode for coding agents
Why Chinese AI labs went open and will remain open
Coding agents and the growing 1% problem
role-model: the case for a model routing protocol

Contact

Email

My new Twitter

First principles of model routing

2026-07-03

Here are a few principles for model routing that I've developed while building role-model, a model router, a routing protocol and a Pi extension. These principles will be useful for anyone that uses a model router, or builds their own.

1. Keep models distinct
I sometimes see people using the latest GPT model with the latest Opus model, each assigned different roles in a coding workflow. This is not wrong per se, but it also isn't optimal from a routing perspective.

While people have different experiences of and preferences for the GPT and Opus model, they are both generalist models with a bent for coding, in the same performance and cost tiers.

This means that routing between them is difficult, because it is hard enough to determine the difficulty of a request in order to match it with a model, but after this is done it is even harder to make the decision which model it should go to when they are neck and neck in every area.

Instead of routing between two frontier models it is better to route between one frontier model and another model that excels in at least one of the sides of the triangle of constraints: speed, quality and cost.

For example, use a router together with GPT 5.5 to extend the quota of your subscription by routing medium and easy requests to DeepSeek V4 Pro. It is significantly cheaper and also significantly less capable on very difficult tasks so routing decisions are easier to make here.

2. Keep the model pool small
This point follows from the previous point. You might think that increasing the pool size with more models to route between is a good thing.

As an example, I could configure role-model to use GPT 5.5, Kimi 2.7, DeepSeek V4 Pro and DeepSeek V4 Flash, and even a couple of smaller GPT models.

But if each model does not have distinct characteristics, adding more models just makes routing decision making harder. In reality, this pool is likely to just end up routing between GPT 5.5 due to performance, and one of the other models that is selected and then kept for smaller tasks due to cache being warm.

So unless you have models with distinct differences, do not add more to the pool. Limit the pool size to 2 by default and only add more models when you can clearly define the roles of each of them. Does adding one increase speed, quality or cost? If not, don't add it.

3. Use relative, real-world benchmarks
I sometimes see some routers that simply route based on model metadata like cost, and other routers that add external benchmarks from sources like Artifical Analysis to the metadata. This is better than nothing but not ideal because the benchmarks may not provide a granular enough performance profile, may not reflect your real workloads, may not be strictly relative and may lack data for certain models.

Additionally, models perform differently on different remote endpoints, may change over time, and perform different on different local systems.

To get clear data for for the profiles of models in the pool it is best to run benchmarks in your router that runs benchmark with individual test tagged with capabilities (tool use, vision, etc), tasks or roles, then scores the model performance side by side relative to each other to build out a richer routing profile.

4. Evaluate past historic decisions to enrich routing decision data
The benchmark should only be seen as a starting point. When we route requests we aim to in a way predict the future: which model will perform optimally on this request given various parameters including cost and speed.

These decisions should be revisited by creating user-specific evaluations based on past requests and to run them as a benchmark across the model pool. This is the best signal you get on how models perform and how to route them.

Additionally, telemetry data tell us about an endpoint's stability, turn around time for a request, and other things that are not captured in catalog metadata.

If you made it all the way here, you can give model routing with your own choice of model a go with role-model. Check out the repo link in the comment below.

If you’ve made it this far, you may want to check out role-model and set up your own pool of models for routing:

https://github.com/try-works/role-model