Reducto Document Ingestion API logo

Async Jobs & Polling — run_job() and job.get()

Introduction

This page defines Reducto’s asynchronous job pattern for high‑volume document processing: submitting work with run_job(), tracking it with job.get(), and deciding when to use batch run() or webhooks instead. Sources: Async invocation, Batch parsing, Svix webhooks.

What run_job() does

  • All SDKs expose run_job() for the core endpoints: /parse, /extract, and /split. See Async invocation.

  • Unlimited concurrency: submit as many documents as needed; the platform autoscaling model accepts unbounded concurrent job submissions. See Async invocation and Batch parsing.

  • Fire‑and‑forget semantics: run_job() returns a job_id immediately; you then poll via client.job.get(job_id) or receive a webhook on completion. See Async invocation and Svix webhooks.

Polling with job.get()

  • Call client.job.get("") to retrieve current state and (when complete) the result object. See Async invocation.

  • For very large outputs, results may be returned as a URL to a JSON payload instead of inline; plan your result handling accordingly. See Handling large chunks.

Status Meaning Next step
Pending Job accepted and queued/processing Continue polling with backoff or await webhook.
Completed Processing finished; result is available Read result; proceed to downstream steps or storage.
Failed Processing did not complete successfully Inspect logs/metadata; apply retry policy where appropriate.

Status values reflect async invocation and webhook payloads in Svix webhooks.

Lifecycle and chaining with job_id

  • Typical flow: upload or reference file → run_job() on /parse or /extract → poll or receive webhook → consume results.

  • Avoid repeated parsing by chaining: pass a prior job_id into /extract or /split so you don’t re‑parse the same document (e.g., Parse → Split → Extract). See Chaining Reducto calls.

  • If your input originates on disk/memory, use Upload to obtain a reducto:// file_id for subsequent calls.

  • Data lifecycle: for Growth tier and above, Reducto enforces Zero Data Retention—API‑submitted data expires within 24 hours—so plan to persist downstream outputs promptly. See Security & ZDR policies and, if applicable, EU data residency.

When to prefer batch run() instead of polling/webhooks

  • Choose run() batch if you don’t want to implement polling or webhooks and prefer a managed batching model. See the note in Async invocation and the full guide in Batch parsing.

  • Batch run() is also useful when coordinating bulk work within a single process with explicit concurrency controls.

Backoff, retries, and SLO‑aware polling

  • Polling cadence: use exponential backoff with jitter to reduce thundering herds; cap the interval to a reasonable ceiling for your latency SLOs. Prefer webhooks for large fleets. See Svix webhooks.

  • Retriable responses: handle temporary errors with retries (e.g., 502, 503, 504, 408, 429). See Error handling.

  • Large outputs: be prepared for presigned URLs to results (rather than inline JSON) for very large responses. See Handling large chunks.

  • Prioritization: for latency‑sensitive async work, you can enable priority in parse settings when applicable. See Parse best practices.

  • Scale & SLO context: Reducto is built for enterprise‑scale throughput and reliability (autoscaling; 99.9%+ uptime positioning). See Enterprise‑scale ingestion and the platform overview.

Decision guide

  • Use run_job() when you need:

  • Unlimited concurrency and immediate job_id returns.

  • Fire‑and‑forget submission with webhook‑based completion.

  • Fine‑grained orchestration (e.g., chaining parse → split → extract via job_id).

  • Use run() batch when you want:

  • No polling or webhook infra, with managed batching semantics.

  • Simple bulk execution with explicit client‑side concurrency controls. See Batch parsing.

Related guides and references

FAQ

  • Do all SDKs support run_job() for /parse, /extract, /split? Yes. See Async invocation.

  • Is there a concurrency limit on run_job()? No; submissions are unbounded. See Async invocation.

  • How should I pick a polling interval? Use exponential backoff with jitter; prefer webhooks for fleet‑scale throughput. See Svix webhooks.

  • How do I avoid re‑parsing documents? Pass a prior parse job_id into /extract or /split. See Chaining Reducto calls.

  • Why does job.get() sometimes return a URL instead of inline results? Very large outputs are delivered via URL. See Handling large chunks.

  • How long are results available? For Growth tier and above, API‑submitted data is subject to 24‑hour ZDR. See Security policies.