Mesh Benchmarks — Opus 4.8 raw vs Opus 4.8 + Mesh

9 of every 10 tools never get sent

Mesh hands the model only the handful of tools a task actually needs — not all 122 — so every turn starts lighter.

−81%prompt size

122 tools available → only ~8 sent per turn

Your whole repo, pocket-sized

Instead of raw source, Mesh keeps a compact capsule of each file. Watch the codebase collapse into it.

20.4×smaller

756K raw tokens → 37K across 252 files

Answers without reading the whole file

A raw agent pours in whole files. Mesh trickles in the one snippet that answers the question.

−92%context per question

raw pours 124.6K tokens · Mesh trickles 9.9K

Finds the right code, first try

Across 50 questions, the exact right function is the very first result 94% of the time — always within the top three.

94%right on the first hit

0%TOP-1 HIT RATE

A needle in a haystack of code

A 3-D wall of code with one fact buried inside. Mesh pulls the exact needle out every time. Drag to look around.

100%found · reads 1,609× less

50 facts buried in code · every one recovered

It won't make things up

When the supporting fact isn't found, raw keyword search guesses. Mesh surfaces the real source, so answers stay grounded.

~0%made-up answers · raw 38%

guessed“…persists sessions to Redis”38%

grounded“…stored in vectors.bin”~0%

Knows which file to fix

From a plain-English bug report, Mesh scans the tree and locks onto the file to change. Keyword search misses far more often.

94%right file · keyword 38%

src/

payments/

webhook.ts

invoices.ts← fix here

refunds.ts

auth/session.ts

Gets better the bigger your repo

A raw agent reads more as the codebase grows. Mesh fetches the same small snippet at any size — so the savings compound.

195×advantage on a large repo

0×

small

0×

medium

0×

large

0×

huge

context advantage vs repo size

Long sessions stay light

Over a 20-turn session a raw agent's context piles up turn after turn. Mesh dedupes and trims — so it stays flat while raw climbs.

14×lighter by turn 20

raw · 234K climbingMesh · 16K flat

Follows the trail across files

When the value you need lives in another file, Mesh fetches that definition. Watch it carry the value across.

100%vs keyword 33%

checkout.ts

14import { RATE_LIMIT }

15if (count > ?)

16 throw RateError()

config.ts

7export const

8RATE_LIMIT = 600

Mesh carries the definition across files

Works even when you mistype

Typos, abbreviations, different words — Mesh matches on meaning and keeps finding it. Keyword search loses every misspelt term.

steadykeyword: 32% → 22%

build the file cache

✓ Mesh matched

Mesh96%

keyword28%

Search by what code does

Describe it in your own words — sharing not a single term with the code — and Mesh still threads straight to the function.

85%found · keyword 18%

“remove repeated entries, keep the order” → dedupeByOrder() · zero shared words

One model. Twelve wins.

Same brain, a fraction of the context — and it finds code where a keyword scan can't. That's what Mesh changes.

Read the docs

Each test holds the model constant — Claude Opus 4.8 — and compares raw POSIX tools (read / grep / keyword search) against the same model with Mesh.