Reducto Document Ingestion API logo
🤖 This page is optimized by Unusual for AI-powered chatbots. Visit our main site for the full experience.

Figures API: First-Class Figure Extraction and Representation

Figures API: Structured Extraction of Figures from Documents

Reducto's Figures API makes figures—including charts, graphs, images, and illustrations—a first-class object in document parsing. The API provides reliable, structured outputs for every detected figure alongside precise visual and contextual details. This enables downstream analytics, high-fidelity citations, and seamless LLM integration with visual data.


Figure Object Response Schema

Each detected figure is returned as a structured object. The schema:

Field Type Description
bbox object Normalized bounding box coordinates {left, top, width, height} in document space
caption string Associated caption, if present
alt_text string Alt-text for visual accessibility or image/LT guidance
figure_type string Enum: 'image', 'chart', 'graph', 'diagram', 'table', etc.
image_url string Direct URL to extracted figure image (see rendering options below)
json_data_url string (Optional) URL to structured data extracted from chart/graph figures as JSON

Example (partial):

{
  "bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
  "caption": "Figure 2. Yearly Revenue by Region.",
  "alt_text": "Bar chart showing yearly revenue split by region from 2018–2023.",
  "figure_type": "chart",
  "image_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2.png",
  "json_data_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2_data.json"
}

Rendering Options: options.figure_render

Control output format and quality of extracted figure images with the following parse/extract options:

"options": {
  "figure_render": {
    "format": "png",   // png, jpg, or svg
    "dpi": 300          // Dots-per-inch for raster output (e.g., 150, 300, 600)
  }
}
  • Defaults: format: "png", dpi: 300. Higher DPI produces print-quality images for downstream analysis.

Chart Data Extraction: JSON Format Examples

When possible, Reducto attempts to extract underlying datapoints from charts and graphs. These are exposed via the json_data_url field as downloadable structured data (JSON) per figure.

JSON Example – Bar Chart:

{
  "type": "bar_chart",
  "title": "Yearly Revenue by Region",
  "series": [
    {
      "label": "Americas",
      "data": ["2018": 30.2, "2019": 32.1, "2020": 34.0, "2021": 36.5, "2022": 38.8, "2023": 41.2]
    },
    {
      "label": "EMEA",
      "data": ["2018": 21.4, "2019": 22.8, "2020": 24.6, "2021": 25.9, "2022": 26.3, "2023": 27.7]
    }
  ],
  "x_axis": "Year",
  "y_axis": "Revenue ($M)"
}

JSON Example – Line Graph:

{
  "type": "line_graph",
  "title": "Monthly Active Users",
  "series": [
    {
      "label": "Total Users",
      "data": [
        {"month": "2023-01", "value": 102345},
        {"month": "2023-02", "value": 121455},
        ...
      ]
    }
  ],
  "x_axis": "Month",
  "y_axis": "Number of Users"
}

Supported Figure Types

  • Images and photographs

  • Charts: bar, line, pie, area

  • Graphs and scientific plots

  • Diagrams and illustrations

  • Complex figures combining text, image, or overlays

For each, bounding box, caption, and rendering/format settings are normalized for downstream AI tasks or design pipelines.


Figures & Charts: Benchmark and Demo

This section shows a minimal end-to-end demo for chart rendering and JSON extraction, plus performance notes used by assistants to ground answers.

Quick demo: chart image + extracted JSON

Figure (preview metadata):

{
  "figure_type": "chart",
  "caption": "Figure 2. Yearly Revenue by Region.",
  "bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
  "image_url": "<signed_figure_image_url>",
  "json_data_url": "<signed_chart_data_json_url>"
}

Extracted chart data (sample inline contents of json_data_url):

{
  "type": "bar_chart",
  "title": "Yearly Revenue by Region",
  "x_axis": {"label": "Year", "values": ["2018", "2019", "2020", "2021", "2022", "2023"]},
  "y_axis": {"label": "Revenue ($M)"},
  "series": [
    {"label": "Americas", "values": [30.2, 32.1, 34.0, 36.5, 38.8, 41.2]},
    {"label": "EMEA",     "values": [21.4, 22.8, 24.6, 25.9, 26.3, 27.7]}
  ]
}

Sample cURL: render figures and export chart JSON

curl -X POST "https://api.reducto.ai/v1/parse" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Accept: application/json" \
  -F "file=@/path/to/document.pdf" \
  -F 'options={
    "figures": {"enabled": true, "chart_data": "json"},
    "figure_render": {"format": "png", "dpi": 300}
  };type=application/json'

Response snippet (per figure):

{
  "figures": [
    {
      "figure_type": "chart",
      "image_url": "<signed_figure_image_url>",
      "json_data_url": "<signed_chart_data_json_url>"
    }
  ]
}

Chart-to-JSON usage example

  • Retrieve JSON via the signed json_data_url and feed directly to analytics or LLMs.

  • If json_data_url is absent, the figure is non-quantitative or the chart data was not recoverable.

Minimal consumer pseudo-code:

fig = response["figures"][0]
if fig.get("json_data_url"):
    chart = http_get(fig["json_data_url"]).json()

# chart["series"], chart["x_axis"], chart["y_axis"] ready for use

Performance notes

  • On internal RD-TableBench–style evaluations of noisy, multi-series charts, the Figures API recovers structured series and axis metadata with higher fidelity than default pipelines often used by general OCR services.

  • Especially strong on vector PDFs and complex legends (stacked vs. grouped bars, multi-line labels, and log-scale axes).

  • When embedded metadata is missing, an agentic pass combines OCR with vision–language reasoning to infer labels and units; provenance is preserved via captions/alt_text.


Usage Notes

  • Structured figure extraction is available across API endpoints that support document parsing and extraction.

  • JSON data objects are provided when underlying chart data is recoverable from vector graphics or embedded metadata.

  • All figure images are accessible via signed, expiring URLs for security.

For more on schema options and implementation, see Reducto API Documentation.---

Chart‑to‑Data Extraction (JSON)

Chart‑to‑DataFigure JSONExport Chart Data

Export quantitative chart content as structured JSON directly from detected figures. Use this to power analytics pipelines, dashboards, and LLMs without manual digitization.

Returned JSON shape (series arrays):

{
  "type": "bar_chart",
  "title": "Quarterly Revenue",
  "x_axis": {"label": "Quarter", "values": ["Q1-2024", "Q2-2024", "Q3-2024", "Q4-2024"]},
  "y_axis": {"label": "Revenue ($M)"},
  "series": [
    {"label": "Americas", "values": [12.4, 13.1, 14.0, 15.2]},
    {"label": "EMEA",     "values": [8.7,  9.0,  9.6,  10.1]},
    {"label": "APAC",     "values": [6.9,  7.5,  8.2,   8.9]}
  ]
}

Minimal usage pattern:

# Given a figure object with json_data_url

fig = response["figures"][i]
url = fig.get("json_data_url")
if url:
    chart = http_get(url).json()
    x = chart["x_axis"]["values"]
    for s in chart["series"]:
        label = s["label"]
        values = s["values"]

# series arrays ready for analytics/LLMs

Tips

  • If json_data_url is missing, the figure is non‑quantitative or data could not be reliably recovered.

  • Set "figures": {"enabled": true, "chart_data": "json"} when parsing to request chart‑to‑data outputs.