Building your contextual brain
Your AI needs to retrieve exactly the right knowledge from thousands of entries — every time, without flooding the context window. We benchmarked three approaches on a production workspace.
0.0%
Recall@10 — zero information loss
Why retrieval accuracy matters
When your AI misses a relevant skill, it silently produces non-compliant output — wrong API patterns, missing design tokens, ignored security protocols. The difference between 93% and 100% recall is the difference between a system that occasionally fails and one that reliably surfaces all relevant knowledge.
83
skills
Production knowledge entries across 15 semantic groups
36
queries
Curated benchmark queries across 10 difficulty categories
90%
reduction
Context tokens saved vs. dumping all skills into context
Three strategies, progressively smarter
We tested three search architectures — from basic vector similarity to our full hierarchical routing with task decomposition.
S3: Decomposed + Two-Layer
Decomposes a broad query into N focused sub-queries. Each sub-query gets its own group routing pass. Results are unioned by max-score per skill.
94.9%
Recall@5
100.0%
Recall@10
89.9%
MRR
89.7%
Hit@1
Side-by-side comparison
Across every core metric, hierarchical routing with decomposition outperforms flat vector search — especially on complex queries.
Recall@5
Recall@10
MRR (Mean Reciprocal Rank)
Hit@1
100% Recall@10
Every single expected knowledge entry was surfaced within the top 10 results — across all 36 benchmark queries. No skill was ever missed. No information was lost.
Zero-recall queries
Avg first hit rank
Complex query R@5
How it works
Our best-performing strategy combines query decomposition with hierarchical group routing for maximum recall with minimal noise.
How Decomposed Two-Layer Routing Works
1. Decompose the Query
A broad query is split into focused sub-tasks, each targeting a specific information need.
2. Route Through Groups (L0) → Skills (L1)
Each sub-task independently activates the most relevant semantic groups, then scores individual skills within those groups. Different sub-tasks can activate different clusters.
3. Max-Score Union
Results from all sub-tasks are merged. For each skill, we take the highest score across all sub-tasks — preserving signal without dilution. Final ranking by merged scores.
Performance by query category
Recall@5 across 10 categories — decomposition shines on cross-group and multi-topic queries where flat search struggles most.
Cross-Group queries: +44.5pp
Queries needing skills from multiple organizational groups went from 47.2% to 91.7% Recall@5 — nearly doubling retrieval accuracy.
Decomposition-advantage queries: 71.3% → 100%
Wide queries that span 4+ topics — exactly where single-embedding search dilutes — now achieve perfect recall through focused sub-queries.
100% recall, minimal context
We don't dump your entire knowledge base into context. Kavela surfaces 5–10 precisely relevant skills per query — a 90% reduction in context tokens versus loading everything, while maintaining perfect recall.
Dump everything
~41,500
tokens consumed per query
Wastes attention, increases cost
Kavela retrieval
~3,500
tokens consumed per query
100% recall, zero noise
Benchmark methodology
Give your AI team a perfect memory
Store your team's knowledge once. Kavela surfaces exactly what's relevant — every time, with 100% recall.
Get early access