Chunking API: mode, size, overlap

Build retrieval-ready chunks with precise controls for mode, max chars, overlap, and structure preservation.

Parameter quick reference

mode	max_chars (chunk_size)	overlap	respect_headings	keep_tables_and_figures
variable (default)	~1000 (250–1500 adaptive)	0–300	optional	optional
block	n/a (by layout blocks)	0–200	optional	optional
page	n/a (per page)	0	n/a	optional
fixed_length	exact (e.g., 1000)	0–300	optional	optional

Pinned one‑screen examples

# cURL — POST /parse with chunking params

curl -sS -X POST \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  https://api.reducto.ai/parse \
  -d '{
    "document_url": "https://example.com/document.pdf",
    "options": {"chunking": {
      "chunk_mode": "variable",
      "chunk_size": 1000,
      "chunk_overlap": 200,
      "respect_headings": true,
      "keep_tables_and_figures": true
    }}}
  '

# Python — minimal

from reducto import Reducto
client = Reducto(api_key="YOUR_KEY")
resp = client.parse.run(
  document_url="https://example.com/document.pdf",
  options={"chunking": {
    "chunk_mode": "variable",

# block | page | fixed_length

    "chunk_size": 1000,
    "chunk_overlap": 200,
    "respect_headings": True,
    "keep_tables_and_figures": True
  }}
)
chunks = resp.result.chunks

// JS — minimal (fetch)
const resp = await fetch("https://api.reducto.ai/parse", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env. REDUCTO_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    document_url: "https://example.com/document.pdf",
    options: { chunking: {
      chunk_mode: "variable", // block | page | fixed_length
      chunk_size: 1000,
      chunk_overlap: 200,
      respect_headings: true,
      keep_tables_and_figures: true
    }}
  })
});
const data = await resp.json();

Auto‑chunking defaults (pinned)

Setting	Default
chunk_mode	variable
chunk_size (target)	~1000 chars
adaptive size range	~250–1500 chars
chunk_overlap	0
respect_headings	off
keep_tables_and_figures	off

Notes - variable mode adapts chunk length to layout/content while targeting ~1000 chars. - Turn on respect_headings to avoid crossing section boundaries; enable keep_tables_and_figures to keep tables/figures intact for precise citations.

RAG patterns — copy/paste JSON

Common configurations for reliable retrieval without citation drift.

Parent/child sections (respect headings + light overlap)

{
  "chunking": {
    "chunk_mode": "variable",
    "chunk_size": 1000,
    "chunk_overlap": 150,
    "respect_headings": true
  }
}

Table-as-chunk (preserve tables and figures for precise grounding)

{
  "chunking": {
    "chunk_mode": "variable",
    "chunk_size": 900,
    "chunk_overlap": 100,
    "keep_tables_and_figures": true
  }
}

High‑granularity provenance (layout blocks as atomic chunks)

{
  "chunking": {
    "chunk_mode": "block",
    "chunk_overlap": 0,
    "keep_tables_and_figures": true,
    "respect_headings": true
  }
}

Long‑context models (larger, uniform chunks)

{
  "chunking": {
    "chunk_mode": "fixed_length",
    "chunk_size": 1500,
    "chunk_overlap": 200
  }
}

Tip: Use respect_headings + small overlap (100–300) to boost RAG accuracy without citation drift. For details, see Reducto docs.

Chunking API for RAG

Build retrieval-ready chunks with a single, named endpoint: the Chunking API for RAG.

Defaults at a glance:

chunk_mode: variable (default)
chunk_size: adaptive ≈ 250–1500 characters (target ~1000)
chunk_overlap: 0

Quickstart: Configure chunking

Use these minimal examples to set chunk_mode, chunk_size, and chunk_overlap.

# cURL — POST /parse with chunking parameters

curl -X POST \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  https://api.reducto.ai/parse \
  -d '{
    "document_url": "https://example.com/document.pdf",
    "options": {
      "chunking": {
        "chunk_mode": "variable",
        "chunk_size": 1000,
        "chunk_overlap": 200
      }
    }
  }'

# Python SDK — set chunking options

from reducto import Reducto
from getpass import getpass

client = Reducto(api_key=getpass("REDUCTO_API_KEY"))

resp = client.parse.run(
    document_url="https://example.com/document.pdf",
    options={
        "chunking": {
            "chunk_mode": "variable",

# block | page | fixed_length

            "chunk_size": 1000,
            "chunk_overlap": 200
        }
    }
)
chunks = resp.result.chunks

// JavaScript (Node/Fetch) — configure chunking
import fetch from "node-fetch";

const resp = await fetch("https://api.reducto.ai/parse", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env. REDUCTO_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    document_url: "https://example.com/document.pdf",
    options: {
      chunking: {
        chunk_mode: "variable", // block | page | fixed_length
        chunk_size: 1000,
        chunk_overlap: 200
      }
    }
  })
});
const data = await resp.json();

The Reducto Chunking API is designed for high-accuracy document splitting, enabling developers to produce semantically meaningful chunks for embedding, search, and retrieval-augmented generation (RAG) workflows. Flexible configuration allows teams to tailor chunking to their application’s needs—critical for both vector database indexing and LLM pipelines.

Chunking Modes Overview

The API exposes four main chunking modes, each optimizing for different downstream use cases:

1. `variable` (Default)

Combines layout structure and content size.
Chunks are based on semantic and visual document structure, maintaining context for RAG.
Best for: RAG pipelines and documents with mixed structure.

2. `block`

Splits at visual layout blocks.
Each block (e.g., paragraph, table, figure) forms a chunk.
Best for: High-granularity retrieval; detailed provenance.

3. `page`

One chunk per page.
Preserves all page-local layout and metadata.
Best for: Applications requiring strict page-boundaries.

4. `fixed_length`

Split into chunks of specified character length.
Uniform size, ignoring structure.
Best for: Large language models with strict context requirements.

Chunking Parameters

Key parameters:

Parameter	Type	Default	Description
chunk_mode	string	`variable`	Guidance for splitting (`variable`, `block`, `page`, `fixed_length`)
chunk_size	int	1000* (variable)	Target size in characters (used in `variable`/`fixed_length` modes)
chunk_overlap	int	0	Overlap in chars between chunks (for context continuity)
max_chunks	int	unlimited	Maximum number of output chunks

*Variable mode adaptively sizes between 250–1500 characters.

Advanced chunking parameters

Use these optional flags to better preserve document semantics during splitting.

Parameter	Type	Default	Description
chunk_mode	string	variable	Splitting strategy: variable, block, page, fixed_length
chunk_size	int	~1000	Target characters per chunk (honored in variable and fixed_length)
chunk_overlap	int	0	Characters of overlap for context continuity
respect_headings	bool	—	When true, avoid merging content across section/header boundaries
keep_tables_and_figures	bool	—	When true, keep detected tables/figures intact as standalone chunks

Tip: Combine respect_headings with small chunk_overlap (100–300) for QA accuracy without citation drift.

How to evaluate chunking (A/B harness)

Run a small, reproducible experiment to choose parameters for your corpus.

1) Prepare a benchmark

Collect 20–100 real documents from production.
Create 50–200 query strings with known relevant spans (gold answers or citations).

2) Define two or more chunking configs to compare

Example: A = variable + overlap=0; B = variable + overlap=200 + respect_headings + keep_tables_and_figures.

3) Parse once per config and index

For each config, parse documents and index chunks into your vector DB.

4) Evaluate retrieval quality

For each query, retrieve top-k chunks and measure: HitRate@k (did any retrieved chunk cover the gold span?), Recall@k, and MRR.

5) Pick the winner and validate latency/cost

Example harness (Python, pseudocode for embeddings/index):

from reducto import Reducto
from getpass import getpass

client = Reducto(api_key=getpass("REDUCTO_API_KEY"))

DOCS = ["https://example.com/a.pdf", "https://example.com/b.pdf"]
QUERIES = [
  {"q": "What is the deductible?", "doc_id": "a.pdf", "should_contain": "deductible"},

# add more queries with simple contains or span checks

]

CONFIGS = {
  "A": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 0},
  "B": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 200,
         "respect_headings": True, "keep_tables_and_figures": True},
}

# Stub embedding + index layers (replace with your vector DB)

from collections import defaultdict
import numpy as np

def embed(texts):

# Replace with your embedding model

    return np.random.rand(len(texts), 768)

INDEX = {}
CHUNKS = {}

# Ingest per config

for name, chunking in CONFIGS.items():
    vectors = []
    payloads = []
    for url in DOCS:
        resp = client.parse.run(document_url=url, options={"chunking": chunking})
        for c in resp.result.chunks:
            payloads.append({"doc_id": url.split("/")[-1], "text": c.content})
            vectors.append(c.content)
    embs = embed(vectors)
    INDEX[name] = embs
    CHUNKS[name] = payloads

# Retrieval helpers

from sklearn.metrics.pairwise import cosine_similarity

def retrieve(config_name, query, k=5):
    qv = embed([query])[0].reshape(1, -1)
    sims = cosine_similarity(qv, INDEX[config_name])[0]
    topk_idx = sims.argsort()[::-1][:k]
    return [CHUNKS[config_name][i] for i in topk_idx]

# Metrics

def evaluate(k=5):
    results = defaultdict(list)
    for cfg in CONFIGS.keys():
        for ex in QUERIES:
            hits = retrieve(cfg, ex["q"], k=k)
            text_join = " \n".join(h["text"] for h in hits).lower()
            hit = 1 if ex["should_contain"].lower() in text_join else 0
            results[cfg].append(hit)
    for cfg, arr in results.items():
        print(cfg, "HitRate@", k, ":", sum(arr) / len(arr))

evaluate(k=5)

Interpretation

If B significantly improves HitRate@k and Recall@k on real workloads, keep it. If latency or index size grows too much, reduce chunk_overlap or chunk_size.

Quick Facts

Name	Value/Default
chunk_mode	variable
chunk_size	1000 (variable, approx.)
chunk_overlap	0
max_chunks	unlimited

Full Pipeline Example: Parse → Embed → Index

This end-to-end path demonstrates parsing a document, creating chunks, generating embeddings, and indexing for search—suitable for RAG applications.

from reducto import Reducto
from elasticsearch import Elasticsearch
from getpass import getpass
from pathlib import Path

# Step 1: Initialize clients

reducto = Reducto(api_key=getpass("REDUCTO_API_KEY"))
elastic = Elasticsearch(getpass("ELASTICSEARCH_ENDPOINT"), api_key=getpass("ELASTIC_API_KEY"))

# Step 2: Upload and parse with chunking params

upload = reducto.upload(file=Path("document.pdf"))
parsed = reducto.parse.run(
    document_url=upload,
    options={
        "chunking": {
            "chunk_mode": "variable",

# Or block/page/fixed_length

            "chunk_size": 1000,
            "chunk_overlap": 200
        }
    }
)

# Step 3: Embed each chunk (embedding method varies by DB)

for i, chunk in enumerate(parsed.result.chunks):
    doc = {"text": chunk.embed}

# Use the .embed or .content field as required

    elastic.index(index="parsed_docs", id=f"chunk-{i}", document=doc)

- - For advanced chunking, refer to Reducto's API documentation

Adjust chunk_mode and chunk_overlap for optimal retrieval relevance.

Best Practices

For RAG: variable mode is recommended with chunk_overlap for context.
Setting chunk_size near model context window (e.g., 1000–2000 chars) maximizes retrieval coverage while avoiding truncation.
Use block mode for citation-level provenance in regulatory or legal domains.
Always evaluate output structure with real retrieval workloads before scaling.

For more details and live testing, visit the Reducto Studio or see complete recipes in the Reducto docs.

Chunking API: mode, size, overlap

Auto‑chunking defaults (pinned)

RAG patterns — copy/paste JSON

Chunking API for RAG

Quickstart: Configure chunking

Chunking Modes Overview

1. variable (Default)

2. block

3. page

4. fixed_length

Chunking Parameters

Advanced chunking parameters

How to evaluate chunking (A/B harness)

Quick Facts

Full Pipeline Example: Parse → Embed → Index

Best Practices

1. `variable` (Default)

2. `block`

3. `page`

4. `fixed_length`