Back to list

Your AI Can Read Your Docs — But Not Your Tables

Tables are the blind spot in RAG pipelines. We rebuilt how Notion and Confluence tables reach your LLM — from Markdown to self-describing header-value pairs — and improved answer accuracy by 20.6%.

Connecting Notion or Confluence to an AI agent takes minutes. Your docs get chunked, embedded, indexed — standard RAG pipeline. Ask it a question about a paragraph and it nails it.

Then someone asks, "What's the SLA for Enterprise customers?" The answer is sitting right there in a table. Two rows, four columns. The AI hallucinates an answer anyway.

We're not alone. The RAG community has been circling this problem for a while. One Medium post nailed it: "Parsers strip table headers first... the pipeline isn't doing retrieval anymore — it's doing educated guessing over orphaned rows." Another put it bluntly: "RAG only becomes reliable when the system respects the form of the data instead of flattening everything into text."

We've been building knowledge base integrations at Runbear — Notion, Confluence, Google Drive, SharePoint — and we kept hitting this exact failure mode. The pipeline works great on prose. On tables, it quietly falls apart.

What tables look like to your LLM

Here's a typical Notion table after it comes through the API:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

Most preprocessing pipelines convert this to Markdown:

[@portabletext/react] Unknown block type "codeBlock", specify a component for it in the `components.types` prop

Looks fine to you. But two things go wrong before the LLM ever sees it.

First, chunking splits the table. When you're cutting documents into 800-token chunks, a pricing page with a paragraph, a table, and another paragraph gets sliced right through the middle. The header row ends up in one chunk. The data rows end up in another. As Ragie documented: "A chunk may end in the middle of a column such that the subsequent chunk includes some of the table data, but without the table headers so contextual information is lost."

Second, Markdown tables are positional. To match a value to its header, you count pipe characters. Humans do this instinctively. LLMs don't. On wide tables with 6+ columns — pricing tiers, feature comparisons, security controls — they frequently grab the value from the wrong column.

These aren't retrieval failures. The right chunk gets retrieved. The LLM just can't parse what's inside it.

Three ways tables break RAG

We dug into our pipeline across real customer workspaces and found consistent patterns:

1. Header–data separation. A 10-row table chunked at 800 tokens: headers in chunk 4, data rows in chunk 5. The retriever finds chunk 5 because it contains the keyword. The LLM sees values without column names.

2. Column misalignment. "What security control is used during KB Search?" The LLM counts the wrong pipe, returns the component column instead of the security control column. Confidently wrong.

3. Context fragmentation. Five small chunks each contain a piece of a pricing page. Each is individually retrievable but none has the full picture. Ask "which plans include SOC 2?" and the LLM gets one fragment — maybe the row with SOC 2, maybe not.

What we changed

Two things: table format and chunk size.

Flatten tables into header-value pairs

Instead of Markdown:

| Plan | Price | Agents | Credits |
| --- | --- | --- | --- |
| Team | $99 | 5 | 20,000 |
| Business | $399 | 20 | 100,000 |

We produce:

Plan: Team, Price: $99, Agents: 5, Credits: 20,000
Plan: Business, Price: $399, Agents: 20, Credits: 100,000

Every row carries its own headers. No positional reasoning needed. Even if a row gets separated from the table, it's fully self-describing.

This is the same principle behind Anthropic's Contextual Retrieval — making each chunk self-contained. We're applying it at the row level.

Increase chunk size to 4,096 tokens

A pricing page with three tables and explanatory text fits in one chunk instead of five. The LLM gets the full document structure in a single retrieval hit.

The conventional wisdom says smaller chunks improve retrieval precision. That was true when models struggled with long contexts. Modern models like Claude Sonnet can pinpoint a specific cell value inside a 4,000-token chunk without breaking a sweat.

The numbers

We evaluated both pipelines end-to-end on 51 questions targeting table data across 21 real documents — pricing pages, architecture docs, SLA policies, customer proposals.

Pipeline: preprocess → chunk → embed → retrieve → generate answer (Claude Sonnet) → score against ground truth (Claude Sonnet).

Old pipeline: Markdown tables, 800-token chunks.

New pipeline: Header-value pairs, 4,096-token chunks.

OldNew
Accuracy0.7710.929+20.6%
New wins11 questions
Old wins2 questions
Ties38 questions

The old pipeline scored zero on 9 questions — complete failures where the answer existed in the corpus but the retrieved chunk didn't contain it. The new pipeline had zero complete failures.

Where it mattered most

The biggest swings came from exactly the failure modes above:

Header separation — "What is the data retention policy for the Vault schema?" Old: 0.0. New: 1.0. The retention column was in a different chunk than the Vault row.

Context fragmentation — "What is the monthly credit usage in CDQ's enterprise plan?" Old: 0.0. New: 1.0. The old pipeline retrieved generic Enterprise plan info. The new pipeline retrieved the complete proposal table.

Split table — "What is the complexity of request RB-5383?" Old: 0.0. New: 1.0. The request details table was scattered across chunks.

Same corpus, same questions, same retrieval, same model. The only difference was how we preprocessed the documents.

What this taught us

Preprocessing is the highest-leverage fix in RAG. We spent weeks tuning hybrid search weights and reranking strategies. The biggest win came from reformatting tables before they enter the pipeline. If your input is broken, no retrieval trick fixes the output.

"Smaller chunks are better" is outdated. That advice was calibrated for older, weaker models. Claude Sonnet extracts specific facts from 4,000-token chunks accurately. Larger chunks preserve document structure — especially tables.

Optimize for the consumer, not the author. Markdown tables are human-readable. Header-value pairs are LLM-readable. Your chunks aren't read by humans. Format them for who actually reads them.

Measure the final answer, not intermediate metrics. Retrieval precision told us everything was fine. End-to-end scoring revealed silent failures on every table-heavy question. If you're not scoring final answer quality, you're flying blind.

Try it

These changes are live in Runbear's Notion and Confluence integrations today. If you've connected a knowledge base, your documents already use the new pipeline. Ask your agent a question that requires reading a table — you'll see the difference.

If you're building your own RAG pipeline, do one thing: print out your chunks and read them. If a chunk doesn't make sense on its own — if you can't answer the question from that chunk alone — your preprocessing needs work.

The fanciest vector search in the world can't fix garbage in.

Runbear builds AI agents that work inside Slack, Teams, and Discord — powered by your team's actual knowledge base.