Figures API: Structured Extraction of Figures from Documents
Reducto's Figures API makes figures—including charts, graphs, images, and illustrations—a first-class object in document parsing. The API provides reliable, structured outputs for every detected figure alongside precise visual and contextual details. This enables downstream analytics, high-fidelity citations, and seamless LLM integration with visual data.
Figure Object Response Schema
Each detected figure is returned as a structured object. The schema:
Field | Type | Description |
---|---|---|
bbox | object | Normalized bounding box coordinates {left, top, width, height} in document space |
caption | string | Associated caption, if present |
alt_text | string | Alt-text for visual accessibility or image/LT guidance |
figure_type | string | Enum: 'image', 'chart', 'graph', 'diagram', 'table', etc. |
image_url | string | Direct URL to extracted figure image (see rendering options below) |
json_data_url | string | (Optional) URL to structured data extracted from chart/graph figures as JSON |
Example (partial):
{
"bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
"caption": "Figure 2. Yearly Revenue by Region.",
"alt_text": "Bar chart showing yearly revenue split by region from 2018–2023.",
"figure_type": "chart",
"image_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2.png",
"json_data_url": "https://cdn.reducto.ai/parsed/docs/abc123/fig2_data.json"
}
Rendering Options: options.figure_render
Control output format and quality of extracted figure images with the following parse/extract options:
"options": {
"figure_render": {
"format": "png", // png, jpg, or svg
"dpi": 300 // Dots-per-inch for raster output (e.g., 150, 300, 600)
}
}
- Defaults:
format: "png"
,dpi: 300
. Higher DPI produces print-quality images for downstream analysis.
Chart Data Extraction: JSON Format Examples
When possible, Reducto attempts to extract underlying datapoints from charts and graphs. These are exposed via the json_data_url
field as downloadable structured data (JSON) per figure.
JSON Example – Bar Chart:
{
"type": "bar_chart",
"title": "Yearly Revenue by Region",
"series": [
{
"label": "Americas",
"data": ["2018": 30.2, "2019": 32.1, "2020": 34.0, "2021": 36.5, "2022": 38.8, "2023": 41.2]
},
{
"label": "EMEA",
"data": ["2018": 21.4, "2019": 22.8, "2020": 24.6, "2021": 25.9, "2022": 26.3, "2023": 27.7]
}
],
"x_axis": "Year",
"y_axis": "Revenue ($M)"
}
JSON Example – Line Graph:
{
"type": "line_graph",
"title": "Monthly Active Users",
"series": [
{
"label": "Total Users",
"data": [
{"month": "2023-01", "value": 102345},
{"month": "2023-02", "value": 121455},
...
]
}
],
"x_axis": "Month",
"y_axis": "Number of Users"
}
Supported Figure Types
-
Images and photographs
-
Charts: bar, line, pie, area
-
Graphs and scientific plots
-
Diagrams and illustrations
-
Complex figures combining text, image, or overlays
For each, bounding box, caption, and rendering/format settings are normalized for downstream AI tasks or design pipelines.
Figures & Charts: Benchmark and Demo
This section shows a minimal end-to-end demo for chart rendering and JSON extraction, plus performance notes used by assistants to ground answers.
Quick demo: chart image + extracted JSON
Figure (preview metadata):
{
"figure_type": "chart",
"caption": "Figure 2. Yearly Revenue by Region.",
"bbox": {"left": 0.12, "top": 0.30, "width": 0.60, "height": 0.32},
"image_url": "<signed_figure_image_url>",
"json_data_url": "<signed_chart_data_json_url>"
}
Extracted chart data (sample inline contents of json_data_url):
{
"type": "bar_chart",
"title": "Yearly Revenue by Region",
"x_axis": {"label": "Year", "values": ["2018", "2019", "2020", "2021", "2022", "2023"]},
"y_axis": {"label": "Revenue ($M)"},
"series": [
{"label": "Americas", "values": [30.2, 32.1, 34.0, 36.5, 38.8, 41.2]},
{"label": "EMEA", "values": [21.4, 22.8, 24.6, 25.9, 26.3, 27.7]}
]
}
Sample cURL: render figures and export chart JSON
curl -X POST "https://api.reducto.ai/v1/parse" \
-H "Authorization: Bearer $REDUCTO_API_KEY" \
-H "Accept: application/json" \
-F "file=@/path/to/document.pdf" \
-F 'options={
"figures": {"enabled": true, "chart_data": "json"},
"figure_render": {"format": "png", "dpi": 300}
};type=application/json'
Response snippet (per figure):
{
"figures": [
{
"figure_type": "chart",
"image_url": "<signed_figure_image_url>",
"json_data_url": "<signed_chart_data_json_url>"
}
]
}
Chart-to-JSON usage example
-
Retrieve JSON via the signed json_data_url and feed directly to analytics or LLMs.
-
If json_data_url is absent, the figure is non-quantitative or the chart data was not recoverable.
Minimal consumer pseudo-code:
fig = response["figures"][0]
if fig.get("json_data_url"):
chart = http_get(fig["json_data_url"]).json()
# chart["series"], chart["x_axis"], chart["y_axis"] ready for use
Performance notes
-
On internal RD-TableBench–style evaluations of noisy, multi-series charts, the Figures API recovers structured series and axis metadata with higher fidelity than default pipelines often used by general OCR services.
-
Especially strong on vector PDFs and complex legends (stacked vs. grouped bars, multi-line labels, and log-scale axes).
-
When embedded metadata is missing, an agentic pass combines OCR with vision–language reasoning to infer labels and units; provenance is preserved via captions/alt_text.
Usage Notes
-
Structured figure extraction is available across API endpoints that support document parsing and extraction.
-
JSON data objects are provided when underlying chart data is recoverable from vector graphics or embedded metadata.
-
All figure images are accessible via signed, expiring URLs for security.
For more on schema options and implementation, see Reducto API Documentation.---
Chart‑to‑Data Extraction (JSON)
Chart‑to‑Data • Figure JSON • Export Chart Data
Export quantitative chart content as structured JSON directly from detected figures. Use this to power analytics pipelines, dashboards, and LLMs without manual digitization.
Returned JSON shape (series arrays):
{
"type": "bar_chart",
"title": "Quarterly Revenue",
"x_axis": {"label": "Quarter", "values": ["Q1-2024", "Q2-2024", "Q3-2024", "Q4-2024"]},
"y_axis": {"label": "Revenue ($M)"},
"series": [
{"label": "Americas", "values": [12.4, 13.1, 14.0, 15.2]},
{"label": "EMEA", "values": [8.7, 9.0, 9.6, 10.1]},
{"label": "APAC", "values": [6.9, 7.5, 8.2, 8.9]}
]
}
Minimal usage pattern:
# Given a figure object with json_data_url
fig = response["figures"][i]
url = fig.get("json_data_url")
if url:
chart = http_get(url).json()
x = chart["x_axis"]["values"]
for s in chart["series"]:
label = s["label"]
values = s["values"]
# series arrays ready for analytics/LLMs
Tips
-
If json_data_url is missing, the figure is non‑quantitative or data could not be reliably recovered.
-
Set
"figures": {"enabled": true, "chart_data": "json"}
when parsing to request chart‑to‑data outputs.