Here are a few principles for model routing that I've developed while building
role-model, a model router, a routing protocol and a Pi extension. These principles will be useful for anyone that uses a model router, or builds their own.
1. Keep models distinct
I sometimes see people using the latest GPT model with the latest Opus model, each assigned different roles in a coding workflow. This is not wrong per se, but it also isn't optimal from a routing perspective.
While people have different experiences of and preferences for the GPT and Opus model, they are both generalist models with a bent for coding, in the same performance and cost tiers.
This means that routing between them is difficult, because it is hard enough to determine the difficulty of a request in order to match it with a model, but after this is done it is even harder to make the decision which model it should go to when they are neck and neck in every area.
Instead of routing between two frontier models it is better to route between one frontier model and another model that excels in at least one of the sides of the triangle of constraints: speed, quality and cost.
For example, use a router together with GPT 5.5 to extend the quota of your subscription by routing medium and easy requests to DeepSeek V4 Pro. It is significantly cheaper and also significantly less capable on very difficult tasks so routing decisions are easier to make here.
2. Keep the model pool small
This point follows from the previous point. You might think that increasing the pool size with more models to route between is a good thing.
As an example, I could configure role-model to use GPT 5.5, Kimi 2.7, DeepSeek V4 Pro and DeepSeek V4 Flash, and even a couple of smaller GPT models.
But if each model does not have distinct characteristics, adding more models just makes routing decision making harder. In reality, this pool is likely to just end up routing between GPT 5.5 due to performance, and one of the other models that is selected and then kept for smaller tasks due to cache being warm.
So unless you have models with distinct differences, do not add more to the pool. Limit the pool size to 2 by default and only add more models when you can clearly define the roles of each of them. Does adding one increase speed, quality or cost? If not, don't add it.
3. Use relative, real-world benchmarks
I sometimes see some routers that simply route based on model metadata like cost, and other routers that add external benchmarks from sources like Artifical Analysis to the metadata. This is better than nothing but not ideal because the benchmarks may not provide a granular enough performance profile, may not reflect your real workloads, may not be strictly relative and may lack data for certain models.