Mar 19, 2026

RAG on a full book

A user indexed a full book and asked for a summary of every chapter. Outputs did not match the book while retrieval still looked fine.

Tech

A user indexed a full book in the knowledge base and asked the model to summarize each chapter. They reported that the outputs did not match the text. I traced it from there.

Nexal AI was on a standard chunk, embed, top-k, concat-to-prompt pipeline. That behaves well enough on short inputs. It is a poor fit for “summarize all chapters” because one retrieval pass does not give you a faithful slice of every chapter at once. You get a small set of globally similar spans; they cluster where the embedding space points, not one coherent span per chapter. The model still completes, so you read fluent text that looks like chapter summaries but is not anchored to the right passages for each chapter.

Without chapter boundaries in the index, the system also cannot reliably answer “what belongs to chapter N” as a retrieval operation. A query about chapter 5 is not the same as fetching all text for chapter 5, and scaling that to every chapter only makes the gap wider.

Pipeline:

Rendering diagram…

This often works when the answer is localized in a short doc.

When the task is book-wide and chapter scoped, high similarity chunks are not a substitute for visiting each chapter with enough contiguous text. The model completes from partial or skewed context; errors read as confident wrong detail.

Structure loss from chunking

The bottleneck was representation, not the embedding model choice.

Chunking strips hierarchy: headings, section breaks, and reading order stop influencing retrieval. You are scoring isolated spans, not navigating the document graph the author wrote.

Rendering diagram…

At book length, naive chunk RAG does not know where chapters start or end. A single top-k round turns “give me something faithful for every chapter” into “here are a few strong scoring fragments from wherever in the book,” which is the wrong coverage pattern for per-chapter summaries.

Structured retrieval

Persist an explicit hierarchy (chapters → sections → paragraphs, or whatever matches the source). Route to one chapter (or section) at a time, fetch text inside that region so context is contiguous, then repeat for the next chapter instead of hoping one global top-k covers the whole outline.

Rendering diagram…

Retrieval in multiple steps (agent)

One vector query is one draw. An agent can iterate over the outline, pick chapter 1, gather context, summarize, pick chapter 2, gather context, summarize, and so on, or back up when context is thin. That is how you actually ground “every chapter” instead of one blended guess from a single retrieval pass.

Rendering diagram…

You pay latency and orchestration. You get a workflow that can visit each chapter with the right spans instead of stuffing one prompt with unrelated winners from across the book.

Failure modes (chunk-only RAG)

The completing sentence often sits just outside the retrieved window so the model never sees it.
A definition lives in one section and the use in another, so both never appear together in context.
The book supports a conclusion it never states in one line, and the retrieved snippets do not force the model to stay tied to evidence.
Larger chunks keep more related sentences in one window but blur similarity; smaller chunks sharpen retrieval but split ideas across separate hits.
The user’s wording diverges from the book’s, so nearest neighbors by embedding are not the spans you needed for that chapter.

On long structured corpora these are baseline risks. They line up with this report: a full set of chapter summaries that read fine but were not faithful.

Retrieval in two stages (explicit structure)

With a chapter → section → paragraph tree, each pass can follow the same shape.

First select the region that probably holds the answer for this step of the task.
Then pull paragraphs inside that region, plus a small neighborhood so sentences stay readable.

For a per-chapter summary job, that region is simply the current chapter before you move on. Prompts stay cleaner and you can log chapter id and paragraph ids when someone flags a bad answer.

Summary

Flat chunking plus top-k vectors assumes one retrieval round can supply the right evidence. A user who wants every chapter summarized needs either repeated, chapter scoped retrieval over an outline or an agent that walks the structure. Single pass traditional RAG cannot pull grounded chunks from each chapter at once, so it will keep drifting.