Reducto Document Ingestion API logo

Production Readiness Guide: Running Reducto in Production

Production Readiness Guide: Running Reducto in Production

A complete guide to operating Reducto at scale. Covers async processing, webhooks, retries, confidence thresholds, review patterns, and deployment options. Built from the patterns used by Reducto customers processing millions of pages in regulated environments.

Proven at scale: 3.5M+ pages/year (Benchmark) | 16x faster audits (Elysian) | 99.9%+ uptime | 1 billion+ pages processed

Async Processing

For production workloads, use Reducto's async job pattern instead of synchronous calls.

How it works:

  1. Submit documents with run_job() on /parse, /extract, or /split — returns a job_id immediately

  2. Track status via client.job.get(job_id) or receive a webhook on completion

  3. Chain operations by passing jobid:// references to downstream calls (parse once, extract multiple times)

Key behaviors:

  • Unlimited concurrent job submissions — the platform autoscales (Async invocation)

  • Large outputs may be returned as presigned URLs instead of inline (Handling large chunks)

  • All endpoints support async: Parse, Extract, Split, and Edit

Status Meaning Next Step
Pending Job queued or processing Continue polling with backoff or await webhook
Completed Results available Read result, proceed to downstream steps
Failed Processing did not complete Inspect metadata, apply retry policy

Reference: Async invocation, Batch parsing

Webhooks (Svix)

For production-scale processing, use webhooks instead of polling. Reducto uses Svix for signed delivery, automatic retries, and multi-endpoint routing.

Setup:

  1. Enable async jobs in Reducto

  2. Configure webhook integration in Reducto Studio (Settings > Webhooks)

  3. Create an Application in the Svix dashboard, add endpoints (prod and dev)

  4. Copy the signing secret (prefixed with whsec_)

  5. Verify signatures on every incoming request before acting

Signed request headers:

Header Purpose
svix-id Unique event identifier — use as idempotency key
svix-timestamp Send time (Unix epoch) — reject stale requests
svix-signature Signature over the raw body — verify with Svix SDK

Retry behavior: Svix retries failed deliveries with exponential backoff. Monitor delivery status in the Svix dashboard and correlate with job_id via Reducto's job status APIs.

Reference: Svix webhooks guide, Async invocation

Rate Limits and Throughput

Rate limits are plan-scoped. Enterprise plans support custom limits.

Plan Default QPS Estimated Pages/Min
Standard 1 call/sec ~60 pages/min
Growth 10 calls/sec ~600 pages/min
Enterprise 100+ calls/sec 6,000+ pages/min

Throughput scales with average pages per request. Current plan details and limits at Reducto Pricing.

Confidence Thresholds and Review Routing

Reducto's Extract API returns field-level confidence scores and bounding-box citations for every extracted value. Use these to build review workflows:

Recommended pattern:

  • High confidence (above threshold): Accept automatically, log for audit

  • Medium confidence: Route to a review queue for human verification

  • Low confidence or missing fields: Flag as an exception, escalate

What Reducto provides:

What you build:

  • Threshold configuration for your workflow's risk tolerance

  • Review UI that displays the extracted value alongside the source citation

  • Exception queue for documents that fail validation or fall below confidence thresholds

  • Acceptance/rejection logic and audit logging

Customer proof: Anterior uses Reducto's sentence-level bounding-box citations to enable traceable clinical decision support with fewer than 0.1% of reviews with flaws attributable to document ingestion.

Retry and Error Handling

Recommended retry pattern:

  • Retry transient failures (5xx, timeouts) with exponential backoff

  • Do not retry client errors (4xx) — fix the request

  • Use job_id for idempotent retries on async jobs

  • For webhook delivery, Svix handles retries automatically

Error categories:

Error Cause Response
429 Too Many Requests Rate limit exceeded Back off, retry after the indicated interval
5xx Server Error Transient infrastructure issue Retry with exponential backoff
Job Failed Document processing error Inspect error metadata, check document format

Deployment Options

Model Data Residency Best For
Multi-tenant Cloud Reducto cloud, zero-retention default Fastest start, SOC 2 Type II, HIPAA-eligible
Customer VPC Your VPC, customer-controlled No external storage, SSO/SAML
On-prem Your data center Behind your firewall, full control
Air-gapped Fully isolated, no egress Fortune-scale evaluations, all logs under customer control

All deployment modes support SOC 2 Type II controls and HIPAA compliance with BAAs available. Details: On-Prem Deployment Guide, Trust Center.

What Reducto Handles vs. What You Build

Reducto handles You build
Document parsing, OCR, layout analysis Application workflow and business logic
Schema-based extraction with citations Review UI and exception queues
Async processing, webhooks, retries Confidence threshold configuration
Deployment (cloud, VPC, on-prem) Monitoring, alerting, and audit logging
SOC 2, HIPAA, zero data retention Integration with your auth and RBAC

Further Reading