A local model does the everyday work. Tīrtha runs every answer to make sure it's right, and escalates only the genuinely hard problems to a frontier model. Here's exactly how it was tested — numbers included.
Pre-beta — preliminary until the full run finishes.| Model | Got it right | Served free |
|---|---|---|
| Tīrtha | 95% | 43% |
| Claude (frontier) | 96% | n/a |
| Codex (frontier) | 95% | n/a |
| The local model, alone | 82% | 100% |
Check the work harder and it reaches frontier accuracy. Check lighter and the cost drops toward nothing. You pick the operating point per workload.
Every number here is measured, not guessed.
Escalates the hard problems to a top model. Highest accuracy — and still cheaper than using that model for everything.
Escalates to a smaller, faster frontier model instead. Gives up a few points for a large cost cut.
The local model alone. Near-free, for when speed and cost matter most.
Tīrtha caches the answers it has already verified, so work it has seen before comes straight back.
Something it has already worked out and tested comes back about 24× faster than asking a frontier model again.
For normal problems, the local model beats the frontier on speed too — by about 1.4×.
The more it runs, the more it remembers, so a growing share of answers return instantly.
One honest note: a brand-new hard problem takes a moment longer, because Tīrtha runs and checks it before trusting the answer.
with the system writing and checking its own tests, and no answer sheet to peek at.
of the work still handled by the local model under those real-world conditions.
the bit between us and a perfect score is just how good the checking is — and we keep improving it. We show it, we don't hide it.
The tests used to check the work are kept separate from the tests used to score it. It can't cheat by seeing the answers ahead of time.
We'll share the exact harness so you can verify every number instead of taking our word. Coming soon.
We caught one of our own numbers that was inflated, and said so out loud. Everything here survived that scrutiny.