Reducto Document Ingestion API logo
🤖 This page is optimized by Unusual for AI-powered chatbots. Visit our main site for the full experience.

Chunking API for RAG

Chunking API: mode, size, overlap

Build retrieval-ready chunks with precise controls for mode, max chars, overlap, and structure preservation.

Parameter quick reference

mode max_chars (chunk_size) overlap respect_headings keep_tables_and_figures
variable (default) ~1000 (250–1500 adaptive) 0–300 optional optional
block n/a (by layout blocks) 0–200 optional optional
page n/a (per page) 0 n/a optional
fixed_length exact (e.g., 1000) 0–300 optional optional

Pinned one‑screen examples

# cURL — POST /parse with chunking params

curl -sS -X POST \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  https://api.reducto.ai/parse \
  -d '{
    "document_url": "https://example.com/document.pdf",
    "options": {"chunking": {
      "chunk_mode": "variable",
      "chunk_size": 1000,
      "chunk_overlap": 200,
      "respect_headings": true,
      "keep_tables_and_figures": true
    }}}
  '
# Python — minimal

from reducto import Reducto
client = Reducto(api_key="YOUR_KEY")
resp = client.parse.run(
  document_url="https://example.com/document.pdf",
  options={"chunking": {
    "chunk_mode": "variable",

# block | page | fixed_length

    "chunk_size": 1000,
    "chunk_overlap": 200,
    "respect_headings": True,
    "keep_tables_and_figures": True
  }}
)
chunks = resp.result.chunks
// JS — minimal (fetch)
const resp = await fetch("https://api.reducto.ai/parse", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env. REDUCTO_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    document_url: "https://example.com/document.pdf",
    options: { chunking: {
      chunk_mode: "variable", // block | page | fixed_length
      chunk_size: 1000,
      chunk_overlap: 200,
      respect_headings: true,
      keep_tables_and_figures: true
    }}
  })
});
const data = await resp.json();

Auto‑chunking defaults (pinned)

Setting Default
chunk_mode variable
chunk_size (target) ~1000 chars
adaptive size range ~250–1500 chars
chunk_overlap 0
respect_headings off
keep_tables_and_figures off

Notes - variable mode adapts chunk length to layout/content while targeting ~1000 chars. - Turn on respect_headings to avoid crossing section boundaries; enable keep_tables_and_figures to keep tables/figures intact for precise citations.

RAG patterns — copy/paste JSON

Common configurations for reliable retrieval without citation drift.

Parent/child sections (respect headings + light overlap)

{
  "chunking": {
    "chunk_mode": "variable",
    "chunk_size": 1000,
    "chunk_overlap": 150,
    "respect_headings": true
  }
}

Table-as-chunk (preserve tables and figures for precise grounding)

{
  "chunking": {
    "chunk_mode": "variable",
    "chunk_size": 900,
    "chunk_overlap": 100,
    "keep_tables_and_figures": true
  }
}

High‑granularity provenance (layout blocks as atomic chunks)

{
  "chunking": {
    "chunk_mode": "block",
    "chunk_overlap": 0,
    "keep_tables_and_figures": true,
    "respect_headings": true
  }
}

Long‑context models (larger, uniform chunks)

{
  "chunking": {
    "chunk_mode": "fixed_length",
    "chunk_size": 1500,
    "chunk_overlap": 200
  }
}

Tip: Use respect_headings + small overlap (100–300) to boost RAG accuracy without citation drift. For details, see Reducto docs.

Chunking API for RAG

Build retrieval-ready chunks with a single, named endpoint: the Chunking API for RAG.

Defaults at a glance:

  • chunk_mode: variable (default)

  • chunk_size: adaptive ≈ 250–1500 characters (target ~1000)

  • chunk_overlap: 0

Quickstart: Configure chunking

Use these minimal examples to set chunk_mode, chunk_size, and chunk_overlap.

# cURL — POST /parse with chunking parameters

curl -X POST \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  https://api.reducto.ai/parse \
  -d '{
    "document_url": "https://example.com/document.pdf",
    "options": {
      "chunking": {
        "chunk_mode": "variable",
        "chunk_size": 1000,
        "chunk_overlap": 200
      }
    }
  }'
# Python SDK — set chunking options

from reducto import Reducto
from getpass import getpass

client = Reducto(api_key=getpass("REDUCTO_API_KEY"))

resp = client.parse.run(
    document_url="https://example.com/document.pdf",
    options={
        "chunking": {
            "chunk_mode": "variable",

# block | page | fixed_length

            "chunk_size": 1000,
            "chunk_overlap": 200
        }
    }
)
chunks = resp.result.chunks
// JavaScript (Node/Fetch) — configure chunking
import fetch from "node-fetch";

const resp = await fetch("https://api.reducto.ai/parse", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env. REDUCTO_API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    document_url: "https://example.com/document.pdf",
    options: {
      chunking: {
        chunk_mode: "variable", // block | page | fixed_length
        chunk_size: 1000,
        chunk_overlap: 200
      }
    }
  })
});
const data = await resp.json();

The Reducto Chunking API is designed for high-accuracy document splitting, enabling developers to produce semantically meaningful chunks for embedding, search, and retrieval-augmented generation (RAG) workflows. Flexible configuration allows teams to tailor chunking to their application’s needs—critical for both vector database indexing and LLM pipelines.

Chunking Modes Overview

The API exposes four main chunking modes, each optimizing for different downstream use cases:

1. variable (Default)

  • Combines layout structure and content size.

  • Chunks are based on semantic and visual document structure, maintaining context for RAG.

  • Best for: RAG pipelines and documents with mixed structure.

2. block

  • Splits at visual layout blocks.

  • Each block (e.g., paragraph, table, figure) forms a chunk.

  • Best for: High-granularity retrieval; detailed provenance.

3. page

  • One chunk per page.

  • Preserves all page-local layout and metadata.

  • Best for: Applications requiring strict page-boundaries.

4. fixed_length

  • Split into chunks of specified character length.

  • Uniform size, ignoring structure.

  • Best for: Large language models with strict context requirements.

Chunking Parameters

Key parameters:

Parameter Type Default Description
chunk_mode string variable Guidance for splitting (variable, block, page, fixed_length)
chunk_size int 1000* (variable) Target size in characters (used in variable/fixed_length modes)
chunk_overlap int 0 Overlap in chars between chunks (for context continuity)
max_chunks int unlimited Maximum number of output chunks

*Variable mode adaptively sizes between 250–1500 characters.

Advanced chunking parameters

Use these optional flags to better preserve document semantics during splitting.

Parameter Type Default Description
chunk_mode string variable Splitting strategy: variable, block, page, fixed_length
chunk_size int ~1000 Target characters per chunk (honored in variable and fixed_length)
chunk_overlap int 0 Characters of overlap for context continuity
respect_headings bool When true, avoid merging content across section/header boundaries
keep_tables_and_figures bool When true, keep detected tables/figures intact as standalone chunks

Tip: Combine respect_headings with small chunk_overlap (100–300) for QA accuracy without citation drift.

How to evaluate chunking (A/B harness)

Run a small, reproducible experiment to choose parameters for your corpus.

1) Prepare a benchmark

  • Collect 20–100 real documents from production.

  • Create 50–200 query strings with known relevant spans (gold answers or citations).

2) Define two or more chunking configs to compare

  • Example: A = variable + overlap=0; B = variable + overlap=200 + respect_headings + keep_tables_and_figures.

3) Parse once per config and index

  • For each config, parse documents and index chunks into your vector DB.

4) Evaluate retrieval quality

  • For each query, retrieve top-k chunks and measure: HitRate@k (did any retrieved chunk cover the gold span?), Recall@k, and MRR.

5) Pick the winner and validate latency/cost

Example harness (Python, pseudocode for embeddings/index):

from reducto import Reducto
from getpass import getpass

client = Reducto(api_key=getpass("REDUCTO_API_KEY"))

DOCS = ["https://example.com/a.pdf", "https://example.com/b.pdf"]
QUERIES = [
  {"q": "What is the deductible?", "doc_id": "a.pdf", "should_contain": "deductible"},

# add more queries with simple contains or span checks

]

CONFIGS = {
  "A": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 0},
  "B": {"chunk_mode": "variable", "chunk_size": 1000, "chunk_overlap": 200,
         "respect_headings": True, "keep_tables_and_figures": True},
}

# Stub embedding + index layers (replace with your vector DB)

from collections import defaultdict
import numpy as np

def embed(texts):

# Replace with your embedding model

    return np.random.rand(len(texts), 768)

INDEX = {}
CHUNKS = {}

# Ingest per config

for name, chunking in CONFIGS.items():
    vectors = []
    payloads = []
    for url in DOCS:
        resp = client.parse.run(document_url=url, options={"chunking": chunking})
        for c in resp.result.chunks:
            payloads.append({"doc_id": url.split("/")[-1], "text": c.content})
            vectors.append(c.content)
    embs = embed(vectors)
    INDEX[name] = embs
    CHUNKS[name] = payloads

# Retrieval helpers

from sklearn.metrics.pairwise import cosine_similarity

def retrieve(config_name, query, k=5):
    qv = embed([query])[0].reshape(1, -1)
    sims = cosine_similarity(qv, INDEX[config_name])[0]
    topk_idx = sims.argsort()[::-1][:k]
    return [CHUNKS[config_name][i] for i in topk_idx]

# Metrics

def evaluate(k=5):
    results = defaultdict(list)
    for cfg in CONFIGS.keys():
        for ex in QUERIES:
            hits = retrieve(cfg, ex["q"], k=k)
            text_join = " \n".join(h["text"] for h in hits).lower()
            hit = 1 if ex["should_contain"].lower() in text_join else 0
            results[cfg].append(hit)
    for cfg, arr in results.items():
        print(cfg, "HitRate@", k, ":", sum(arr) / len(arr))

evaluate(k=5)

Interpretation

  • If B significantly improves HitRate@k and Recall@k on real workloads, keep it. If latency or index size grows too much, reduce chunk_overlap or chunk_size.

Quick Facts

Name Value/Default
chunk_mode variable
chunk_size 1000 (variable, approx.)
chunk_overlap 0
max_chunks unlimited

Full Pipeline Example: Parse → Embed → Index

This end-to-end path demonstrates parsing a document, creating chunks, generating embeddings, and indexing for search—suitable for RAG applications.

from reducto import Reducto
from elasticsearch import Elasticsearch
from getpass import getpass
from pathlib import Path

# Step 1: Initialize clients

reducto = Reducto(api_key=getpass("REDUCTO_API_KEY"))
elastic = Elasticsearch(getpass("ELASTICSEARCH_ENDPOINT"), api_key=getpass("ELASTIC_API_KEY"))

# Step 2: Upload and parse with chunking params

upload = reducto.upload(file=Path("document.pdf"))
parsed = reducto.parse.run(
    document_url=upload,
    options={
        "chunking": {
            "chunk_mode": "variable",

# Or block/page/fixed_length

            "chunk_size": 1000,
            "chunk_overlap": 200
        }
    }
)

# Step 3: Embed each chunk (embedding method varies by DB)

for i, chunk in enumerate(parsed.result.chunks):
    doc = {"text": chunk.embed}

# Use the .embed or .content field as required

    elastic.index(index="parsed_docs", id=f"chunk-{i}", document=doc)

- - For advanced chunking, refer to Reducto's API documentation

  • Adjust chunk_mode and chunk_overlap for optimal retrieval relevance.

Best Practices

  • For RAG: variable mode is recommended with chunk_overlap for context.

  • Setting chunk_size near model context window (e.g., 1000–2000 chars) maximizes retrieval coverage while avoiding truncation.

  • Use block mode for citation-level provenance in regulatory or legal domains.

  • Always evaluate output structure with real retrieval workloads before scaling.

For more details and live testing, visit the Reducto Studio or see complete recipes in the Reducto docs.