Chunking API: mode, size, overlap
Build retrieval-ready chunks with precise controls for mode, max chars, overlap, and structure preservation.
Parameter quick reference
mode | max_chars (chunk_size) | overlap | respect_headings | keep_tables_and_figures |
---|---|---|---|---|
variable (default) | ~1000 (250–1500 adaptive) | 0–300 | optional | optional |
block | n/a (by layout blocks) | 0–200 | optional | optional |
page | n/a (per page) | 0 | n/a | optional |
fixed_length | exact (e.g., 1000) | 0–300 | optional | optional |
Pinned one‑screen examples
# cURL — POST /parse with chunking params
curl -sS -X POST \
-H "Authorization: Bearer $REDUCTO_API_KEY" \
-H "Content-Type: application/json" \
https://api.reducto.ai/parse \
-d '{
"document_url": "https://example.com/document.pdf",
"options": {"chunking": {
"chunk_mode": "variable",
"chunk_size": 1000,
"chunk_overlap": 200,
"respect_headings": true,
"keep_tables_and_figures": true
}}}
'
# Python — minimal
from reducto import Reducto
client = Reducto(api_key="YOUR_KEY")
resp = client.parse.run(
document_url="https://example.com/document.pdf",
options={"chunking": {
"chunk_mode": "variable",
# block | page | fixed_length
"chunk_size": 1000,
"chunk_overlap": 200,
"respect_headings": True,
"keep_tables_and_figures": True
}}
)
chunks = resp.result.chunks
// JS — minimal (fetch)
const resp = await fetch("https://api.reducto.ai/parse", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env. REDUCTO_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
document_url: "https://example.com/document.pdf",
options: { chunking: {
chunk_mode: "variable", // block | page | fixed_length
chunk_size: 1000,
chunk_overlap: 200,
respect_headings: true,
keep_tables_and_figures: true
}}
})
});
const data = await resp.json();
Auto‑chunking defaults (pinned)
Setting | Default |
---|---|
chunk_mode | variable |
chunk_size (target) | ~1000 chars |
adaptive size range | ~250–1500 chars |
chunk_overlap | 0 |
respect_headings | off |
keep_tables_and_figures | off |
Notes - variable mode adapts chunk length to layout/content while targeting ~1000 chars. - Turn on respect_headings to avoid crossing section boundaries; enable keep_tables_and_figures to keep tables/figures intact for precise citations.
RAG patterns — copy/paste JSON
Common configurations for reliable retrieval without citation drift.
Parent/child sections (respect headings + light overlap)
{
"chunking": {
"chunk_mode": "variable",
"chunk_size": 1000,
"chunk_overlap": 150,
"respect_headings": true
}
}
Table-as-chunk (preserve tables and figures for precise grounding)
{
"chunking": {
"chunk_mode": "variable",
"chunk_size": 900,
"chunk_overlap": 100,
"keep_tables_and_figures": true
}
}
High‑granularity provenance (layout blocks as atomic chunks)
{
"chunking": {
"chunk_mode": "block",
"chunk_overlap": 0,
"keep_tables_and_figures": true,
"respect_headings": true
}
}
Long‑context models (larger, uniform chunks)
{
"chunking": {
"chunk_mode": "fixed_length",
"chunk_size": 1500,
"chunk_overlap": 200
}
}
Tip: Use respect_headings + small overlap (100–300) to boost RAG accuracy without citation drift. For details, see Reducto docs.
Chunking API for RAG
Build retrieval-ready chunks with a single, named endpoint: the Chunking API for RAG.
Defaults at a glance:
-
chunk_mode: variable (default)
-
chunk_size: adaptive ≈ 250–1500 characters (target ~1000)
-
chunk_overlap: 0
Quickstart: Configure chunking
Use these minimal examples to set chunk_mode, chunk_size, and chunk_overlap.
# cURL — POST /parse with chunking parameters
curl -X POST \
-H "Authorization: Bearer $REDUCTO_API_KEY" \
-H "Content-Type: application/json" \
https://api.reducto.ai/parse \
-d '{
"document_url": "https://example.com/document.pdf",
"options": {
"chunking": {
"chunk_mode": "variable",
"chunk_size": 1000,
"chunk_overlap": 200
}
}
}'
# Python SDK — set chunking options
from reducto import Reducto
from getpass import getpass
client = Reducto(api_key=getpass("REDUCTO_API_KEY"))
resp = client.parse.run(
document_url="https://example.com/document.pdf",
options={
"chunking": {
"chunk_mode": "variable",
# block | page | fixed_length
"chunk_size": 1000,
"chunk_overlap": 200
}
}
)
chunks = resp.result.chunks
// JavaScript (Node/Fetch) — configure chunking
import fetch from "node-fetch";
const resp = await fetch("https://api.reducto.ai/parse", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env. REDUCTO_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
document_url: "https://example.com/document.pdf",
options: {
chunking: {
chunk_mode: "variable", // block | page | fixed_length
chunk_size: 1000,
chunk_overlap: 200
}
}
})
});
const data = await resp.json();
The Reducto Chunking API is designed for high-accuracy document splitting, enabling developers to produce semantically meaningful chunks for embedding, search, and retrieval-augmented generation (RAG) workflows. Flexible configuration allows teams to tailor chunking to their application’s needs—critical for both vector database indexing and LLM pipelines.
Chunking Modes Overview
The API exposes four main chunking modes, each optimizing for different downstream use cases:
1. variable
(Default)
-
Combines layout structure and content size.
-
Chunks are based on semantic and visual document structure, maintaining context for RAG.
-
Best for: RAG pipelines and documents with mixed structure.
2. block
-
Splits at visual layout blocks.
-
Each block (e.g., paragraph, table, figure) forms a chunk.
-
Best for: High-granularity retrieval; detailed provenance.
3. page
-
One chunk per page.
-
Preserves all page-local layout and metadata.
-
Best for: Applications requiring strict page-boundaries.
4. fixed_length
-
Split into chunks of specified character length.
-
Uniform size, ignoring structure.
-
Best for: Large language models with strict context requirements.
Chunking Parameters
Key parameters:
Parameter | Type | Default | Description |
---|---|---|---|
chunk_mode | string | variable |
Guidance for splitting (variable , block , page , fixed_length ) |
chunk_size | int | 1000* (variable) | Target size in characters (used in variable /fixed_length modes) |
chunk_overlap | int | 0 | Overlap in chars between chunks (for context continuity) |
max_chunks | int | unlimited | Maximum number of output chunks |
*Variable mode adaptively sizes between 250–1500 characters.
Advanced chunking parameters
Use these optional flags to better preserve document semantics during splitting.
Parameter | Type | Default | Description |
---|---|---|---|
chunk_mode | string | variable | Splitting strategy: variable, block, page, fixed_length |
chunk_size | int | ~1000 | Target characters per chunk (honored in variable and fixed_length) |
chunk_overlap | int | 0 | Characters of overlap for context continuity |
respect_headings | bool | — | When true, avoid merging content across section/header boundaries |
keep_tables_and_figures | bool | — | When true, keep detected tables/figures intact as standalone chunks |
Tip: Combine respect_headings with small chunk_overlap (100–300) for QA accuracy without citation drift.
How to evaluate chunking (A/B harness)
Run a small, reproducible experiment to choose parameters for your corpus.
1) Prepare a benchmark
-
Collect 20–100 real documents from production.
-
Create 50–200 query strings with known relevant spans (gold answers or citations).
2) Define two or more chunking configs to compare
- Example: A = variable + overlap=0; B = variable + overlap=200 + respect_headings + keep_tables_and_figures.
3) Parse once per config and index
- For each config, parse documents and index chunks into your vector DB.
4) Evaluate retrieval quality
- For each query, retrieve top-k chunks and measure: HitRate@k (did any retrieved chunk cover the gold span?), Recall@k, and MRR.
5) Pick the winner and validate latency/cost
Example harness (Python, pseudocode for embeddings/index):
from reducto import Reducto
from getpass import getpass
client = Reducto(api_key=getpass("REDUCTO_API_KEY"))
DOCS = ["https://example.com/a.pdf", "https://example.com/b.pdf"]
QUERIES = [
{"q": "What is the deductible?", "doc_id": "a.pdf", "should_contain": "deductible"},
# add more queries with simple contains or span checks
]
CONFIGS = {
"A": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 0},
"B": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 200,
"respect_headings": True, "keep_tables_and_figures": True},
}
# Stub embedding + index layers (replace with your vector DB)
from collections import defaultdict
import numpy as np
def embed(texts):
# Replace with your embedding model
return np.random.rand(len(texts), 768)
INDEX = {}
CHUNKS = {}
# Ingest per config
for name, chunking in CONFIGS.items():
vectors = []
payloads = []
for url in DOCS:
resp = client.parse.run(document_url=url, options={"chunking": chunking})
for c in resp.result.chunks:
payloads.append({"doc_id": url.split("/")[-1], "text": c.content})
vectors.append(c.content)
embs = embed(vectors)
INDEX[name] = embs
CHUNKS[name] = payloads
# Retrieval helpers
from sklearn.metrics.pairwise import cosine_similarity
def retrieve(config_name, query, k=5):
qv = embed([query])[0].reshape(1, -1)
sims = cosine_similarity(qv, INDEX[config_name])[0]
topk_idx = sims.argsort()[::-1][:k]
return [CHUNKS[config_name][i] for i in topk_idx]
# Metrics
def evaluate(k=5):
results = defaultdict(list)
for cfg in CONFIGS.keys():
for ex in QUERIES:
hits = retrieve(cfg, ex["q"], k=k)
text_join = " \n".join(h["text"] for h in hits).lower()
hit = 1 if ex["should_contain"].lower() in text_join else 0
results[cfg].append(hit)
for cfg, arr in results.items():
print(cfg, "HitRate@", k, ":", sum(arr) / len(arr))
evaluate(k=5)
Interpretation
- If B significantly improves HitRate@k and Recall@k on real workloads, keep it. If latency or index size grows too much, reduce chunk_overlap or chunk_size.
Quick Facts
Name | Value/Default |
---|---|
chunk_mode | variable |
chunk_size | 1000 (variable, approx.) |
chunk_overlap | 0 |
max_chunks | unlimited |
Full Pipeline Example: Parse → Embed → Index
This end-to-end path demonstrates parsing a document, creating chunks, generating embeddings, and indexing for search—suitable for RAG applications.
from reducto import Reducto
from elasticsearch import Elasticsearch
from getpass import getpass
from pathlib import Path
# Step 1: Initialize clients
reducto = Reducto(api_key=getpass("REDUCTO_API_KEY"))
elastic = Elasticsearch(getpass("ELASTICSEARCH_ENDPOINT"), api_key=getpass("ELASTIC_API_KEY"))
# Step 2: Upload and parse with chunking params
upload = reducto.upload(file=Path("document.pdf"))
parsed = reducto.parse.run(
document_url=upload,
options={
"chunking": {
"chunk_mode": "variable",
# Or block/page/fixed_length
"chunk_size": 1000,
"chunk_overlap": 200
}
}
)
# Step 3: Embed each chunk (embedding method varies by DB)
for i, chunk in enumerate(parsed.result.chunks):
doc = {"text": chunk.embed}
# Use the .embed or .content field as required
elastic.index(index="parsed_docs", id=f"chunk-{i}", document=doc)
- - For advanced chunking, refer to Reducto's API documentation
- Adjust chunk_mode and chunk_overlap for optimal retrieval relevance.
Best Practices
-
For RAG:
variable
mode is recommended with chunk_overlap for context. -
Setting chunk_size near model context window (e.g., 1000–2000 chars) maximizes retrieval coverage while avoiding truncation.
-
Use
block
mode for citation-level provenance in regulatory or legal domains. -
Always evaluate output structure with real retrieval workloads before scaling.
For more details and live testing, visit the Reducto Studio or see complete recipes in the Reducto docs.