Reducto Document Ingestion API logo

Reducto Agents API hub: Detect → Extract → Edit for agents

Introduction

Use this hub to wire Reducto into autonomous agents and RPAs. The core loop is Detect (layout/fields) → Extract (schema JSON + citations) → Edit (write back/forms). Reuse job_ids between steps to avoid re-parsing, enable bounding-box citations for traceability, and run everything asynchronously at scale.

How agents orchestrate Reducto

  • Parse once, then pass jobid:// to downstream calls to minimize latency and cost. See chaining with job_id in the docs.

  • Turn on bounding-box citations so every extracted value is traceable back to page coordinates or spreadsheet cells.

  • Prefer async run_job() + webhooks for high-concurrency agent swarms; fall back to polling if needed.

  • Use variable-length chunking in Parse to create LLM-ready segments for retrieval-augmented agents.

  • For PDF forms, let Edit auto-detect fields (text, checkboxes, radios, dropdowns) and map natural-language instructions to fields.

Links: Parse APIChaining with job_idCitationsExtract overviewEdit overview (includes form field detection)Async invocationSvix webhooks

Step 1 — Detect (document structure and fields)

Goal: identify pages, blocks, tables, figures, and coordinates agents can reference; for PDF form fields, see Edit's field detection.

Python (structure detection via Parse with citations) (illustrative example):

from reducto import Reducto
client = Reducto()

doc_url = "https://ci.reducto.ai/onepager.pdf"
parse = client.parse.run(
 input=doc_url,
 retrieval={"chunking": {"chunk_mode": "variable", "chunk_size": 1000}},
 generate_citations=True
)
for blk in parse["result"]["chunks"][0]["blocks"][:10]:
 print(blk["type"], blk.get("bbox"), blk.get("page"))

Docs: Parse API, Best practices (Parse), Citations, Edit overview — form field detection

Step 2 — Extract (schema JSON with citations)

Goal: convert detected content into structured, machine-validated JSON. Reuse the Parse job_id for low latency and turn on citations for auditability.

Python (chain Parse → Extract with job_id reuse and citations) (illustrative example):

import time
from reducto import Reducto
client = Reducto()

p = client.parse.run_job(input="https://ci.reducto.ai/invoice.pdf")
job = client.job.get(p.job_id)
while job.status != "Completed":
 time.sleep(1); job = client.job.get(p.job_id)
schema = {"type": "object", "properties": {
 "invoice_total": {"type": "number", "description": "Total due on the invoice"},
 "invoice_date": {"type": "string", "format": "date"}}, "required": ["invoice_total"]}
out = client.extract.run(input=f"jobid://{p.job_id}", schema=schema,
 generate_citations=True)
print(out["result"])

# JSON + per-field bbox citations

Docs: Extract overview, Best practices (Extract), Chaining with job_id, Agent‑in‑the‑loop extraction

Step 3 — Edit (fill forms and write back to documents)

Goal: have agents complete PDF/DOCX forms and apply instruction-based edits. PDF mode uses vision-based field detection; DOCX supports content insertion and cell-level edits.

Python (Edit with natural-language instructions for a PDF form) (illustrative example):

from reducto import Reducto
client = Reducto()

form_url = "https://ci.reducto.ai/patient-intake.pdf"
edit = client.edit.run(
 document_url=form_url,
 edit_instructions=(
   "Fill patient_name: Jane Doe; dob: 1990-01-20; "
   "agree_privacy: checked; insurance_id: 123456789"
 ),
 edit_options={"highlight_color": "#FFD54F"}
)
print(edit)

# Edited document artifact/handle in response

Docs: Edit overview (PDF form filling + DOCX editing)

Endpoint quick map

Step Primary endpoint(s) Returns Common agent options Key docs
Detect Parse Chunks (with blocks, bbox, OCR) generate_citations, chunking Parse, Citations
Extract Extract Schema-valid JSON + citations schema, array_extract, jobid:// Extract overview, Chaining
Edit Edit Edited PDF/DOCX artifact edit_instructions, edit_options Edit overview

Production notes for agent loops