arcee-ai/trinity-large-thinking

Arcee-ai/trinity-large-thinking is really worth checking out. I've tested it mostly for knowledge at this point, and on that metric, my first impressions are that it's very very good.

It's also very inexpensive to run over API, and very fast. On Openrouter, it costs $0.22/M input tokens $0.85/M output tokens (and there's currently a free API version of it available until April 22, 2026). The paid API version on Openrouter gave me @109 tokens per second (in Jan using a cheap netbook to connect).

In a really cool turn, a IQ2_XXS quantization of this massive thing is also able to run directly on a single Strix Halo machine, so I imagine that a very usable version with high enough precision should run nicely on a cluster of 2 Strix Halo laptops - I'll report back as soon as I have some comparisons between Trinity, Minimax, and Qwen 3.5 397 running on that cluster.

However those local tests go, this model is definitely at the top of my list of go-to inexpensive models to run at Openrouter.