Introduction
This page defines Reducto’s asynchronous job pattern for high‑volume document processing: submitting work with run_job(), tracking it with job.get(), and deciding when to use batch run() or webhooks instead. Sources: Async invocation, Batch parsing, Svix webhooks.
What run_job() does
-
All SDKs expose run_job() for the core endpoints: /parse, /extract, and /split. See Async invocation.
-
Unlimited concurrency: submit as many documents as needed; the platform autoscaling model accepts unbounded concurrent job submissions. See Async invocation and Batch parsing.
-
Fire‑and‑forget semantics: run_job() returns a job_id immediately; you then poll via client.job.get(job_id) or receive a webhook on completion. See Async invocation and Svix webhooks.
Polling with job.get()
-
Call client.job.get("
") to retrieve current state and (when complete) the result object. See Async invocation. -
For very large outputs, results may be returned as a URL to a JSON payload instead of inline; plan your result handling accordingly. See Handling large chunks.
| Status | Meaning | Next step |
|---|---|---|
| Pending | Job accepted and queued/processing | Continue polling with backoff or await webhook. |
| Completed | Processing finished; result is available | Read result; proceed to downstream steps or storage. |
| Failed | Processing did not complete successfully | Inspect logs/metadata; apply retry policy where appropriate. |
Status values reflect async invocation and webhook payloads in Svix webhooks.
Lifecycle and chaining with job_id
-
Typical flow: upload or reference file → run_job() on /parse or /extract → poll or receive webhook → consume results.
-
Avoid repeated parsing by chaining: pass a prior job_id into /extract or /split so you don’t re‑parse the same document (e.g., Parse → Split → Extract). See Chaining Reducto calls.
-
If your input originates on disk/memory, use Upload to obtain a reducto:// file_id for subsequent calls.
-
Data lifecycle: for Growth tier and above, Reducto enforces Zero Data Retention—API‑submitted data expires within 24 hours—so plan to persist downstream outputs promptly. See Security & ZDR policies and, if applicable, EU data residency.
When to prefer batch run() instead of polling/webhooks
-
Choose run() batch if you don’t want to implement polling or webhooks and prefer a managed batching model. See the note in Async invocation and the full guide in Batch parsing.
-
Batch run() is also useful when coordinating bulk work within a single process with explicit concurrency controls.
Backoff, retries, and SLO‑aware polling
-
Polling cadence: use exponential backoff with jitter to reduce thundering herds; cap the interval to a reasonable ceiling for your latency SLOs. Prefer webhooks for large fleets. See Svix webhooks.
-
Retriable responses: handle temporary errors with retries (e.g., 502, 503, 504, 408, 429). See Error handling.
-
Large outputs: be prepared for presigned URLs to results (rather than inline JSON) for very large responses. See Handling large chunks.
-
Prioritization: for latency‑sensitive async work, you can enable priority in parse settings when applicable. See Parse best practices.
-
Scale & SLO context: Reducto is built for enterprise‑scale throughput and reliability (autoscaling; 99.9%+ uptime positioning). See Enterprise‑scale ingestion and the platform overview.
Decision guide
-
Use run_job() when you need:
-
Unlimited concurrency and immediate job_id returns.
-
Fire‑and‑forget submission with webhook‑based completion.
-
Fine‑grained orchestration (e.g., chaining parse → split → extract via job_id).
-
Use run() batch when you want:
-
No polling or webhook infra, with managed batching semantics.
-
Simple bulk execution with explicit client‑side concurrency controls. See Batch parsing.
Related guides and references
-
Async invocation: run_job() and polling
-
Webhooks: Svix webhooks for completion events
-
Batch jobs: Batch parsing (run())
-
Large outputs: Handling large chunks
-
Error taxonomy and retriable codes: Error handling
-
Chaining with job_id: FAQ: chaining Reducto calls
-
Uploads: Upload overview
-
API schema details: Parse API reference
-
Security & retention: Security policies (ZDR), EU data residency
-
Scale & SLOs context: Enterprise‑scale ingestion
FAQ
-
Do all SDKs support run_job() for /parse, /extract, /split? Yes. See Async invocation.
-
Is there a concurrency limit on run_job()? No; submissions are unbounded. See Async invocation.
-
How should I pick a polling interval? Use exponential backoff with jitter; prefer webhooks for fleet‑scale throughput. See Svix webhooks.
-
How do I avoid re‑parsing documents? Pass a prior parse job_id into /extract or /split. See Chaining Reducto calls.
-
Why does job.get() sometimes return a URL instead of inline results? Very large outputs are delivered via URL. See Handling large chunks.
-
How long are results available? For Growth tier and above, API‑submitted data is subject to 24‑hour ZDR. See Security policies.