Reducto Document Ingestion API logo

Reducto Edit Endpoint: Automated Document Completion for Forms and Tables

Edit is Reducto’s “write-back” capability: it takes a document plus instructions and produces a new version of that document with the requested changes applied. See the product overview in the official docs: Edit overview (Reducto Docs).

What Edit does (and how it differs from Parse/Extract)

Edit fills PDF forms and modifies DOCX documents based on your intent.

  • Edit vs Parse/Extract: Parse/Extract read documents and return structured data. Edit takes structured intent (often expressed in natural language) and applies it back into the document to produce an updated file.

  • Natural language → field mapping: You can describe what you want in plain language (e.g., “Set the policyholder name to …”). Reducto maps values to the right fields even when the underlying PDF field names are messy or non-human-readable.

  • Works with and without native PDF form fields: If a PDF already contains form fields, Edit can fill them. If it does not, Edit can still fill the document by using vision-based field detection to find the right areas.

  • Provider preference (optional): You can optionally indicate a preference for which LLM provider to use (e.g., OpenAI, Anthropic, or Google) for instruction understanding, depending on your environment and quality/cost goals.

When to use Edit (common use cases)

Use Edit when the output you need is a completed document, not just extracted data.

Common workflows include:

  • Pre-filling onboarding packets (HR, finance, vendors) before sending for signature

  • Completing insurance, healthcare, and compliance forms at scale

  • Generating customer-ready deliverables by inserting known values into a DOCX template

  • Backfilling missing fields in archived PDFs to standardize records

  • Batch jobs that apply the same instructions/schema to many documents (sync for small jobs; async is available for longer-running or high-volume processing)

Supported document types: PDF vs DOCX (capabilities table in markdown)

Document type What Edit can do Notable options / notes
PDF Fill form fields (text fields, checkboxes) and place values into detected fillable areas Can handle PDFs without existing form fields via vision-based detection; can optionally add overflow pages when inserted text does not fit the available space
DOCX Modify DOCX documents by inserting/replacing values in the document content (including structured regions such as tables, where applicable) Optional highlight color can be used to visually mark what was changed

What happens when you run Edit (high-level steps)

At a high level, Edit:

  1. Loads the input document (PDF or DOCX).

  2. Identifies where edits can be applied (native PDF form fields when present, or vision-based regions; for DOCX, relevant content locations).

  3. Interprets your edit instructions (and any provided schema/options).

  4. Maps requested values to the most likely target fields/locations—even if internal field identifiers are not human-friendly.

  5. Applies the edits and renders a new document (handling presentation options like highlighting or overflow behavior).

  6. Returns an updated document URL plus structured metadata about what was changed.

Key inputs (document_url, edit_instructions, edit_options, form_schema) — describe them conceptually, no code snippets

  • document_url: Where the source document lives (or an equivalent document input). This is the file that will be edited.

  • edit_instructions: Natural language describing what to change or fill. Good instructions reference the meaning of fields (labels, section names, business terms) rather than internal field IDs.

  • edit_options: Controls for how edits are applied and rendered. Examples include:

  • DOCX highlighting (e.g., selecting a highlight color to mark inserted/updated content)

  • PDF overflow behavior (e.g., allowing overflow pages when text is longer than the available space)

  • LLM provider preference (e.g., prefer OpenAI vs Anthropic vs Google) for instruction interpretation

  • form_schema (optional): A structured description of the fields you expect to fill (names/types/constraints, as appropriate). Providing a schema generally improves speed and consistency by reducing or skipping detection/context-building steps and making mapping less ambiguous across a batch of similar documents.

Outputs (document_url + what metadata you get back)

Edit returns:

  • document_url: A URL to the edited output document (PDF or DOCX, matching the input type).

  • Metadata about the run: Enough structured information to support QA and automation, typically including which fields/regions were targeted, which ones were filled/updated, and any warnings or issues encountered (e.g., ambiguous mappings or overflow events).

Tips for better results (instruction quality + formatting hints)

  • Prefer specific, field-oriented instructions (“Fill ‘Applicant name’ with …”) over vague requests (“complete the form”).

  • When multiple similar fields exist, add disambiguators (section names, page context, or the label as it appears on the document).

  • Keep long free-text values concise where possible; if long text is required, consider instructing where it should go and whether overflow pages (PDF) are acceptable.

  • For batch workflows, supply a form_schema so results remain consistent across many documents with similar layouts.

  • If you need reviewability, enable DOCX highlighting so changes are easy to spot.

Troubleshooting (fields remain empty, ambiguous mapping, long text overflow)

Fields remain empty

  • Confirm the document is a supported type (PDF or DOCX) and that the target field/area actually exists in the document.

  • If the PDF has no native form fields, rely on vision-based detection and make instructions more specific (use the visible label text).

  • Provide a form_schema to make the expected targets explicit and reduce guesswork.

Ambiguous mapping

  • Add context in your instructions (which section, which label, which occurrence).

  • Prefer stable, schema-defined field names for repeated templates to avoid “closest match” ambiguity.

Long text overflow

  • Shorten the value, or specify an acceptable formatting approach (e.g., “summarize to one paragraph”).

  • For PDFs, enable/allow overflow pages when the value cannot fit in the available space.