# Contextual retrieval

**One-line description.** Anthropic's preprocessing recipe for RAG: before chunks are embedded and indexed, an LLM (the *contextualizer*) reads the whole document together with each chunk and emits 50–100 tokens of situating context that gets prepended to the chunk. The contextualized chunk then feeds **both** a semantic index (embedding model → vector DB) and a lexical index (TF-IDF → BM25). At runtime the query hits both indices, results merge through rank fusion, and the top-K chunks go to the generative model. The diagram's job is to show the contextualizer as the distinctive new step and the dual-track preprocessing it feeds.

## Default diagram type

**Flowchart (poster style) with two stacked phases.** The pattern has a name, a clear preprocessing / runtime split, a fan-out into parallel tracks, and a distinctive new step — that's poster flowchart territory. A flat linear chart would smear the two phases and hide what's new. Stack the phases vertically with eyebrow dividers; each phase reads left-to-right.

Alternate types:
- **Structural — subsystem containers side by side** when contrasting contextual retrieval against plain RAG (two siblings, each a mini pipeline). See `structural.md` → "Rich interior for subsystem containers".
- **Preprocessing-only flowchart** when the runtime story isn't needed — drop phase 2 and end at the two indices.

## Palette

Three accent ramps plus gray, under the poster-flowchart 4-ramp exception (ramps encode role *categories*, not sequence):

- **`c-gray`** — corpus, query, rank fusion, top-K chunks, response. Neutral data / IO.
- **`c-purple`** — Claude in both its roles: contextualizer and generative model. One ramp for both anchors the "same Claude, two prompts" story without adding a fourth color.
- **`c-teal`** — semantic track (embedding model + vector DB).
- **`c-amber`** — lexical track (TF-IDF + BM25 index).

Do **not** color the contextualizer and generative model differently — doing so implies different models or different roles, but the whole point is the same Claude doing both jobs.

## Sub-pattern

`flowchart.md` → **Poster flowchart pattern** (eyebrow-divided phases, ≤4 ramps for role categories) + **Fan-out + aggregator (simple mode)** applied twice: the contextualizer splits into two tracks that never reconverge in phase 1, and query + both indices converge at rank fusion in phase 2.

## Mermaid reference

```mermaid
flowchart TB
    subgraph Preprocessing
        C[Corpus] -- chunks --> CTX[Contextualizer · Claude]
        CTX -- context + chunk --> EM[Embedding model]
        CTX -- context + chunk --> TF[TF-IDF]
        EM --> VDB[(Vector DB)]
        TF --> BM[(BM25 index)]
    end
    subgraph Runtime
        Q[User query] --> RF[Rank fusion]
        VDB --> RF
        BM --> RF
        RF --> TK[Top-K chunks] --> GM[Generative model · Claude] --> R[Response]
    end
```

Defining edges: `CTX --> EM` *and* `CTX --> TF` (the contextualized chunk goes to both tracks) plus `VDB --> RF` *and* `BM --> RF` (both indices feed fusion). Drop either pair and the diagram collapses into plain RAG or embedding-only retrieval.

## Baoyu SVG plan

Two stacked phases with eyebrow labels and a thin horizontal divider between them.

- **viewBox**: `0 0 680 540`
- **Phase 1 eyebrow** — *Preprocessing · Runs once per corpus update* at `(40, 50)`, class `eyebrow`.

Phase 1 interior:
- **Corpus** — `c-gray`, `x=40 y=80 w=100 h=56`, two-line (*Corpus*, *Documents*).
- **Contextualizer** — `c-purple`, `x=180 y=72 w=260 h=72`, multi-line (*Contextualizer*, *Claude*, *50–100 tokens per chunk*). Visibly the largest box — it's the pattern's signature step.
- **Embedding model** — `c-teal`, `x=140 y=180 w=160 h=48`, single-line.
- **TF-IDF** — `c-amber`, `x=380 y=180 w=160 h=48`, single-line.
- **Vector DB** — `c-teal`, `x=140 y=260 w=160 h=56`, two-line (*Vector DB*, *Semantic index*).
- **BM25 index** — `c-amber`, `x=380 y=260 w=160 h=56`, two-line (*BM25 index*, *Lexical index*).

**Phase 1 arrows:**
- *Corpus → Contextualizer*: `(140, 108) → (180, 108)`, label *chunks* at `(160, 102)`.
- *Contextualizer → Embedding model*: L-bend `(260, 144) → (260, 160) → (220, 160) → (220, 180)`, label *context + chunk* at `(170, 164)` `text-anchor="end"`.
- *Contextualizer → TF-IDF*: L-bend `(360, 144) → (360, 160) → (460, 160) → (460, 180)`, label *context + chunk* at `(470, 164)` `text-anchor="start"`. (Both arrows labeled — the reader must see that *both* tracks receive the contextualized chunk.)
- *Embedding model → Vector DB*: straight vertical `(220, 228) → (220, 260)`.
- *TF-IDF → BM25 index*: straight vertical `(460, 228) → (460, 260)`.

- **Phase divider** — dashed line `x1=40 y1=340 x2=640 y2=340`, class `arr-alt`.
- **Phase 2 eyebrow** — *Runtime · Per user query* at `(40, 362)`, class `eyebrow`.

Phase 2 interior (single horizontal row at y=400–456):
- *User query* `c-gray` `x=40 w=100`, *Rank fusion* `c-gray` `x=160 w=100`, *Top-K chunks* `c-gray` `x=280 w=100` (two-line with subtitle *Top 20*), *Generative model* `c-purple` `x=400 w=140` (two-line with subtitle *Claude*), *Response* `c-gray` `x=560 w=80`. All `y=400 h=56`.

**Phase 2 arrows** (straight horizontal, 20px gaps between boxes at y=428): query→fusion, fusion→top-K, top-K→generator, generator→response.

**Cross-phase arrows** (indices into rank fusion):
- *Vector DB → Rank fusion*: vertical drop `(200, 316) → (200, 400)` — lands inside rank fusion's top edge (x=160–260).
- *BM25 index → Rank fusion*: L-bend `(460, 316) → (460, 372) → (220, 372) → (220, 400)`. The 20px x-offset from the Vector DB arrow keeps the two inbound arrows from stacking.

Both cross-phase arrows are solid `.arr` — they're the main data flow, nothing alternate.

**Legend** (bottom, required — 3 accent ramps encode category):

```
[■] Claude (contextualizer + generator)    [■] Semantic track    [■] Lexical track
```

Place at `y=510`, centered at `x=340`.

**Gotchas.**
- Both tracks must show they receive the *contextualized* chunk — label both outgoing arrows from the contextualizer. If only one is labeled, readers assume the other track still uses raw chunks.
- Do not draw the contextualizer as a self-loop on the Corpus. It's a distinct LLM step that runs once per chunk with whole doc + chunk as input, conceptually closer to an orchestrator than an inline transform.
- Keep rank fusion gray, not amber — it merges two tracks but it's a structural aggregator, not an accent role. Giving it amber visually absorbs it into the lexical track.

**Reranker variant.** The reranking extension inserts a **reranker** box between *Rank fusion* and *Top-K chunks*. Insert `Reranker` at `x=280 y=400 w=120 h=56` (shift Top-K, generator, response right by 140 and widen the viewBox to 820). Annotate the reranker's input arrow with *top 150* and its output with *top 20* — the winnowing ratio is the whole point.