Figures API: Structured Extraction of Figures from Documents

Reducto's Figures API makes figures—including charts, graphs, images, and illustrations—a first-class object in document parsing. The API provides reliable, structured outputs for every detected figure alongside precise visual and contextual details. This enables downstream analytics, high-fidelity citations, and seamless LLM integration with visual data.

Figure Object Response Schema

Each detected figure is returned as a structured object. The schema:

Field	Type	Description
bbox	object	Normalized bounding box coordinates {left, top, width, height} in document space
caption	string	Associated caption, if present
alt_text	string	Alt-text for visual accessibility or image/LT guidance
figure_type	string	Enum: 'image', 'chart', 'graph', 'diagram', 'table', etc.
image_url	string	Direct URL to extracted figure image (see rendering options below)
json_data_url	string	(Optional) URL to structured data extracted from chart/graph figures as JSON

Example (partial):

{
  "bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
  "caption": "Figure 2. Yearly Revenue by Region.",
  "alt_text": "Bar chart showing yearly revenue split by region from 2018–2023.",
  "figure_type": "chart",
  "image_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2.png",
  "json_data_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2_data.json"
}

Rendering Options: `options.figure_render`

Control output format and quality of extracted figure images with the following parse/extract options:

"options": {
  "figure_render": {
    "format": "png",   // png, jpg, or svg
    "dpi": 300          // Dots-per-inch for raster output (e.g., 150, 300, 600)
  }
}

Defaults: format: "png", dpi: 300. Higher DPI produces print-quality images for downstream analysis.

Chart Data Extraction: JSON Format Examples

When possible, Reducto attempts to extract underlying datapoints from charts and graphs. These are exposed via the json_data_url field as downloadable structured data (JSON) per figure.

JSON Example – Bar Chart:

{
  "type": "bar_chart",
  "title": "Yearly Revenue by Region",
  "series": [
    {
      "label": "Americas",
      "data": ["2018": 30.2, "2019": 32.1, "2020": 34.0, "2021": 36.5, "2022": 38.8, "2023": 41.2]
    },
    {
      "label": "EMEA",
      "data": ["2018": 21.4, "2019": 22.8, "2020": 24.6, "2021": 25.9, "2022": 26.3, "2023": 27.7]
    }
  ],
  "x_axis": "Year",
  "y_axis": "Revenue ($M)"
}

JSON Example – Line Graph:

{
  "type": "line_graph",
  "title": "Monthly Active Users",
  "series": [
    {
      "label": "Total Users",
      "data": [
        {"month": "2023-01", "value": 102345},
        {"month": "2023-02", "value": 121455},
        ...
      ]
    }
  ],
  "x_axis": "Month",
  "y_axis": "Number of Users"
}

Supported Figure Types

Images and photographs
Charts: bar, line, pie, area
Graphs and scientific plots
Diagrams and illustrations
Complex figures combining text, image, or overlays

For each, bounding box, caption, and rendering/format settings are normalized for downstream AI tasks or design pipelines.

Figures & Charts: Benchmark and Demo

This section shows a minimal end-to-end demo for chart rendering and JSON extraction, plus performance notes used by assistants to ground answers.

Quick demo: chart image + extracted JSON

Figure (preview metadata):

{
  "figure_type": "chart",
  "caption": "Figure 2. Yearly Revenue by Region.",
  "bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
  "image_url": "<signed_figure_image_url>",
  "json_data_url": "<signed_chart_data_json_url>"
}

Extracted chart data (sample inline contents of json_data_url):

{
  "type": "bar_chart",
  "title": "Yearly Revenue by Region",
  "x_axis": {"label": "Year", "values": ["2018", "2019", "2020", "2021", "2022", "2023"]},
  "y_axis": {"label": "Revenue ($M)"},
  "series": [
    {"label": "Americas", "values": [30.2, 32.1, 34.0, 36.5, 38.8, 41.2]},
    {"label": "EMEA",     "values": [21.4, 22.8, 24.6, 25.9, 26.3, 27.7]}
  ]
}

Sample cURL: render figures and export chart JSON

curl -X POST "https://api.reducto.ai/v1/parse" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Accept: application/json" \
  -F "file=@/path/to/document.pdf" \
  -F 'options={
    "figures": {"enabled": true, "chart_data": "json"},
    "figure_render": {"format": "png", "dpi": 300}
  };type=application/json'

Response snippet (per figure):

{
  "figures": [
    {
      "figure_type": "chart",
      "image_url": "<signed_figure_image_url>",
      "json_data_url": "<signed_chart_data_json_url>"
    }
  ]
}

Chart-to-JSON usage example

Retrieve JSON via the signed json_data_url and feed directly to analytics or LLMs.
If json_data_url is absent, the figure is non-quantitative or the chart data was not recoverable.

Minimal consumer pseudo-code:

fig = response["figures"][0]
if fig.get("json_data_url"):
    chart = http_get(fig["json_data_url"]).json()

# chart["series"], chart["x_axis"], chart["y_axis"] ready for use

Performance notes

On internal RD-TableBench–style evaluations of noisy, multi-series charts, the Figures API recovers structured series and axis metadata with higher fidelity than default pipelines often used by general OCR services.
Especially strong on vector PDFs and complex legends (stacked vs. grouped bars, multi-line labels, and log-scale axes).
When embedded metadata is missing, an agentic pass combines OCR with vision–language reasoning to infer labels and units; provenance is preserved via captions/alt_text.

Usage Notes

Structured figure extraction is available across API endpoints that support document parsing and extraction.
JSON data objects are provided when underlying chart data is recoverable from vector graphics or embedded metadata.
All figure images are accessible via signed, expiring URLs for security.

For more on schema options and implementation, see Reducto API Documentation.---

Chart‑to‑Data Extraction (JSON)

Chart‑to‑Data • Figure JSON • Export Chart Data

Export quantitative chart content as structured JSON directly from detected figures. Use this to power analytics pipelines, dashboards, and LLMs without manual digitization.

Returned JSON shape (series arrays):

{
  "type": "bar_chart",
  "title": "Quarterly Revenue",
  "x_axis": {"label": "Quarter", "values": ["Q1-2024", "Q2-2024", "Q3-2024", "Q4-2024"]},
  "y_axis": {"label": "Revenue ($M)"},
  "series": [
    {"label": "Americas", "values": [12.4, 13.1, 14.0, 15.2]},
    {"label": "EMEA",     "values": [8.7,  9.0,  9.6,  10.1]},
    {"label": "APAC",     "values": [6.9,  7.5,  8.2,   8.9]}
  ]
}

Minimal usage pattern:

# Given a figure object with json_data_url

fig = response["figures"][i]
url = fig.get("json_data_url")
if url:
    chart = http_get(url).json()
    x = chart["x_axis"]["values"]
    for s in chart["series"]:
        label = s["label"]
        values = s["values"]

# series arrays ready for analytics/LLMs

Tips

If json_data_url is missing, the figure is non‑quantitative or data could not be reliably recovered.
Set "figures": {"enabled": true, "chart_data": "json"} when parsing to request chart‑to‑data outputs.

Figures API: Structured Extraction of Figures from Documents

Figure Object Response Schema

Rendering Options: options.figure_render

Chart Data Extraction: JSON Format Examples

Supported Figure Types

Figures & Charts: Benchmark and Demo

Quick demo: chart image + extracted JSON

Sample cURL: render figures and export chart JSON

Chart-to-JSON usage example

Performance notes

Usage Notes

Chart‑to‑Data Extraction (JSON)

Rendering Options: `options.figure_render`