Why retrieval accuracy matters

When your AI misses a relevant skill, it silently produces non-compliant output — wrong API patterns, missing design tokens, ignored security protocols. The difference between 93% and 100% recall is the difference between a system that occasionally fails and one that reliably surfaces all relevant knowledge.

83

skills

Production knowledge entries across 15 semantic groups

36

queries

Curated benchmark queries across 10 difficulty categories

90%

reduction

Context tokens saved vs. dumping all skills into context

Three strategies, progressively smarter

We tested three search architectures — from basic vector similarity to our full hierarchical routing with task decomposition.

S3: Decomposed + Two-Layer

Decomposes a broad query into N focused sub-queries. Each sub-query gets its own group routing pass. Results are unioned by max-score per skill.

94.9%

Recall@5

100.0%

Recall@10

89.9%

MRR

89.7%

Hit@1

Side-by-side comparison

Across every core metric, hierarchical routing with decomposition outperforms flat vector search — especially on complex queries.

Recall@5

S1

86.7%

S2

88.9%

S3

94.9%

Recall@10

S1

93.2%

S2

95.7%

S3

100.0%

MRR (Mean Reciprocal Rank)

S1

78.9%

S2

84.1%

S3

89.9%

Hit@1

S1

71.8%

S2

78.6%

S3

89.7%

Metric

S1: Flat Semantic

S2: Two-Layer Routing

S3: Decomposed + Two-Layer

Composite Score

76.8%

79.4%

85.0%

Recall@5

86.7%

88.9%

94.9%

Recall@10

93.2%

95.7%

100.0%✓

Hit@1

71.8%

78.6%

89.7%

MRR

78.9%

84.1%

89.9%

NDCG@10

77.8%

80.2%

84.0%

Precision@5

48.7%

48.2%

55.7%

Avg First Hit Rank

2.0

1.8

1.4

Zero Information Loss

100% Recall@10

Every single expected knowledge entry was surfaced within the top 10 results — across all 36 benchmark queries. No skill was ever missed. No information was lost.

Zero-recall queries

10

Avg first hit rank

2.01.4

Complex query R@5

78.8%98.4%

How it works

Our best-performing strategy combines query decomposition with hierarchical group routing for maximum recall with minimal noise.

How Decomposed Two-Layer Routing Works

1. Decompose the Query

A broad query is split into focused sub-tasks, each targeting a specific information need.

"Remotion pipeline""Design system tokens""Motion graphics""Production workflow"

2. Route Through Groups (L0) → Skills (L1)

Each sub-task independently activates the most relevant semantic groups, then scores individual skills within those groups. Different sub-tasks can activate different clusters.

3. Max-Score Union

Results from all sub-tasks are merged. For each skill, we take the highest score across all sub-tasks — preserving signal without dilution. Final ranking by merged scores.

Performance by query category

Recall@5 across 10 categories — decomposition shines on cross-group and multi-topic queries where flat search struggles most.

100% recall, minimal context

We don't dump your entire knowledge base into context. Kavela surfaces 5–10 precisely relevant skills per query — a 90% reduction in context tokens versus loading everything, while maintaining perfect recall.

Dump everything

~41,500

tokens consumed per query

Wastes attention, increases cost

Kavela retrieval

~3,500

tokens consumed per query

100% recall, zero noise

Benchmark methodology

Give your AI team a perfect memory

Store your team's knowledge once. Kavela surfaces exactly what's relevant — every time, with 100% recall.

Get early access

Building your contextual brain

Why retrieval accuracy matters

Three strategies, progressively smarter

Flat Semantic

Two-Layer Routing

Decomposed + Two-Layer

S3: Decomposed + Two-Layer

Side-by-side comparison

100% Recall@10

How it works

How Decomposed Two-Layer Routing Works

Performance by query category

100% recall, minimal context

Benchmark methodology

Give your AI team a perfect memory