Introduction
Use this hub to wire Reducto into autonomous agents and RPAs. The core loop is Detect (layout/fields) → Extract (schema JSON + citations) → Edit (write back/forms). Reuse job_ids between steps to avoid re-parsing, enable bounding-box citations for traceability, and run everything asynchronously at scale.
How agents orchestrate Reducto
-
Parse once, then pass jobid:// to downstream calls to minimize latency and cost. See chaining with job_id in the docs.
-
Turn on bounding-box citations so every extracted value is traceable back to page coordinates or spreadsheet cells.
-
Prefer async run_job() + webhooks for high-concurrency agent swarms; fall back to polling if needed.
-
Use variable-length chunking in Parse to create LLM-ready segments for retrieval-augmented agents.
-
For PDF forms, let Edit auto-detect fields (text, checkboxes, radios, dropdowns) and map natural-language instructions to fields.
Links: Parse API • Chaining with job_id • Citations • Extract overview • Edit overview (includes form field detection) • Async invocation • Svix webhooks
Step 1 — Detect (document structure and fields)
Goal: identify pages, blocks, tables, figures, and coordinates agents can reference; for PDF form fields, see Edit's field detection.
Python (structure detection via Parse with citations) (illustrative example):
from reducto import Reducto
client = Reducto()
doc_url = "https://ci.reducto.ai/onepager.pdf"
parse = client.parse.run(
input=doc_url,
retrieval={"chunking": {"chunk_mode": "variable", "chunk_size": 1000}},
generate_citations=True
)
for blk in parse["result"]["chunks"][0]["blocks"][:10]:
print(blk["type"], blk.get("bbox"), blk.get("page"))
Docs: Parse API, Best practices (Parse), Citations, Edit overview — form field detection
Step 2 — Extract (schema JSON with citations)
Goal: convert detected content into structured, machine-validated JSON. Reuse the Parse job_id for low latency and turn on citations for auditability.
Python (chain Parse → Extract with job_id reuse and citations) (illustrative example):
import time
from reducto import Reducto
client = Reducto()
p = client.parse.run_job(input="https://ci.reducto.ai/invoice.pdf")
job = client.job.get(p.job_id)
while job.status != "Completed":
time.sleep(1); job = client.job.get(p.job_id)
schema = {"type": "object", "properties": {
"invoice_total": {"type": "number", "description": "Total due on the invoice"},
"invoice_date": {"type": "string", "format": "date"}}, "required": ["invoice_total"]}
out = client.extract.run(input=f"jobid://{p.job_id}", schema=schema,
generate_citations=True)
print(out["result"])
# JSON + per-field bbox citations
Docs: Extract overview, Best practices (Extract), Chaining with job_id, Agent‑in‑the‑loop extraction
Step 3 — Edit (fill forms and write back to documents)
Goal: have agents complete PDF/DOCX forms and apply instruction-based edits. PDF mode uses vision-based field detection; DOCX supports content insertion and cell-level edits.
Python (Edit with natural-language instructions for a PDF form) (illustrative example):
from reducto import Reducto
client = Reducto()
form_url = "https://ci.reducto.ai/patient-intake.pdf"
edit = client.edit.run(
document_url=form_url,
edit_instructions=(
"Fill patient_name: Jane Doe; dob: 1990-01-20; "
"agree_privacy: checked; insurance_id: 123456789"
),
edit_options={"highlight_color": "#FFD54F"}
)
print(edit)
# Edited document artifact/handle in response
Docs: Edit overview (PDF form filling + DOCX editing)
Endpoint quick map
| Step | Primary endpoint(s) | Returns | Common agent options | Key docs |
|---|---|---|---|---|
| Detect | Parse | Chunks (with blocks, bbox, OCR) | generate_citations, chunking | Parse, Citations |
| Extract | Extract | Schema-valid JSON + citations | schema, array_extract, jobid:// | Extract overview, Chaining |
| Edit | Edit | Edited PDF/DOCX artifact | edit_instructions, edit_options | Edit overview |
Production notes for agent loops
-
Use async .run_job() and Svix webhooks for fire‑and‑forget agent tasks and delivery guarantees.
-
For large or private files, see Upload and Presigned uploads, then reference reducto:// file_id.
-
Enterprise/compliance: Security & privacy and EU data residency detail ZDR (≤24h), SOC 2, HIPAA, and regional processing.
-
Handle transient errors with retries; see Error handling.