OCR (Optical Character Recognition)
This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the OCR model is available upon request.
OCR models extract text from images and PDF documents. Through the Nebul Inference API, you can get access to several language-vision model specialized in extracting information from images or documents into text in multilingual contexts.
Quick Start
Recommended: Upload then OCR
The simplest flow: upload your file, then run OCR on it.
- Python
- cURL
import httpxAPI_URL = "https://api.inference.nebul.io"API_KEY = "<YOUR_API_KEY>"HEADERS = {"Authorization": f"Bearer {API_KEY}"}# Step 1: Uploadwith open("document.pdf", "rb") as f:upload = httpx.post(f"{API_URL}/v1/files",files={"file": ("document.pdf", f, "application/pdf")},data={"purpose": "ocr"},headers=HEADERS,)file_id = upload.json()["id"]# Step 2: OCRresult = httpx.post(f"{API_URL}/v1/files/{file_id}/ocr",json={"model": "deepseek-ai/DeepSeek-OCR"},headers={**HEADERS, "Content-Type": "application/json"},timeout=120.0,).json()for page in result["pages"]:print(f"Page {page['index']}: {page['markdown'][:100]}...")# Cleanuphttpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)
# Step 1: UploadFILE_ID=$(curl -s -X POST https://api.inference.nebul.io/v1/files \-H "Authorization: Bearer <YOUR_API_KEY>" \-F "file=@document.pdf" \-F "purpose=ocr" | jq -r '.id')# Step 2: OCRcurl -X POST https://api.inference.nebul.io/v1/files/$FILE_ID/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR"}'
Direct: /v1/ocr with a URL
If you already have a publicly accessible URL, you can call the OCR endpoint directly:
- Python
- cURL
import httpxurl = "https://api.inference.nebul.io/v1/ocr"payload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/image.png"}}response = httpx.post(url, json=payload, headers={"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}, timeout=120.0)for page in response.json()["pages"]:print(page["markdown"])
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/image.png"}}'
Endpoints
POST /v1/files/{file_id}/ocr: Upload + OCR
Upload a file first (see Uploading Files), then OCR it by file ID. The service resolves the file internally, no signed URLs to manage.
Request body (FileOCROptions):
| Parameter | Type | Required | Description |
|---|---|---|---|
model | String | No | Model identifier (e.g. lightonai/LightOnOCR-2-1B). See Models. |
pages | String | Integer | Array | No | Pages to process for multi-page PDFs. See Page Selection. |
include_image_base64 | Boolean | No | Include base64-encoded page images in response. Defaults to false. |
prompt | String | No | Custom OCR prompt. Defaults to "Read all text in this image and format it as markdown.". |
document_annotation_prompt | String | No | Mistral SDK-compatible prompt. Populates document_annotation in the response. |
skip_special_tokens | Boolean | No | Strip special tokens (e.g. <|ref|>) from output. |
POST /v1/ocr: Direct OCR (Mistral SDK compatible)
Call the OCR service directly. You supply the document yourself.
Request body (MistralOCRRequest):
| Parameter | Type | Required | Description |
|---|---|---|---|
model | String | No | Model identifier (e.g. zai-org/GLM-OCR). See Models. |
document | Object | Yes | Document specification. See Document Input Types. |
pages | String | Integer | Array | No | Pages to process for multi-page PDFs. |
include_image_base64 | Boolean | No | Include base64-encoded page images. Defaults to false. |
prompt | String | No | Custom OCR prompt. |
document_annotation_prompt | String | No | Mistral SDK-compatible prompt. Populates document_annotation in the response. |
skip_special_tokens | Boolean | No | Strip special tokens from output. |
Uploading Files
Before using /v1/files/{file_id}/ocr, upload your file:
POST /v1/filesContent-Type: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The document to upload (PDF, PNG, or JPEG). |
purpose | String | Yes | Must be "ocr". |
- Python
- cURL
with open("invoice.pdf", "rb") as f:upload = httpx.post(f"{API_URL}/v1/files",files={"file": ("invoice.pdf", f, "application/pdf")},data={"purpose": "ocr"},headers=HEADERS,)file_id = upload.json()["id"]
curl -X POST https://api.inference.nebul.io/v1/files \-H "Authorization: Bearer <YOUR_API_KEY>" \-F "file=@invoice.pdf" \-F "purpose=ocr"
The response includes an id field you use in the OCR request:
{"id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b","filename": "invoice.pdf","bytes": 245678,"purpose": "ocr","status": "complete"}
See the Files API reference for full upload, list, download, and delete endpoints, plus multipart uploads for large files.
Document Input Types
When calling /v1/ocr directly, the document field supports three input types:
document_url: URL or signed URL
Pass a publicly accessible URL to the document:
{"type": "document_url","document_url": "https://example.com/document.pdf"}
Also accepts data: URIs (data:application/pdf;base64,...).
content: Inline base64
Embed the document directly in the request, no upload needed:
{"type": "content","document_content": "data:application/pdf;base64,JVBERi0xLjQK..."}
file_id: Reference an uploaded file
Use a file ID from a previous upload (no signed URL required):
{"type": "file_id","file_id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b","org_id": "0192a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b"}
When using file_id, the OCR service resolves the file internally. Provide org_id here or via the X-Org-Id header.
Response Format
All OCR endpoints return the same response shape:
{"id": "ocr-1731500000","model": "deepseek-ai/DeepSeek-OCR","pages": [{"index": 0,"markdown": "Extracted text from the image in markdown format...","images": [],"dimensions": {"width": 595,"height": 842,"dpi": 72}}],"usage_info": {"pages_processed": 1,"doc_size_bytes": 245678,"prompt_tokens": 1200,"completion_tokens": 3400,"total_tokens": 4600},"usage": {"prompt_tokens": 1200,"completion_tokens": 3400,"total_tokens": 4600}}
Response Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique identifier for the OCR request. |
model | String | Model ID used for OCR. |
pages | Array | List of processed pages. |
pages[].index | Integer | Zero-based page index. |
pages[].markdown | String | Extracted text in markdown format. |
pages[].images | Array | Extracted images (populated when include_image_base64 is true). |
pages[].dimensions | Object | Page dimensions with width, height, and dpi. |
usage_info.pages_processed | Integer | Number of pages processed. |
usage_info.doc_size_bytes | Integer | Document size in bytes. |
usage_info.prompt_tokens | Integer | Input tokens used. |
usage_info.completion_tokens | Integer | Output tokens generated. |
usage_info.total_tokens | Integer | Total tokens. |
usage | Object | Token usage in OpenAI-compatible format. |
With include_image_base64
When include_image_base64 is true, each page's images array contains detected image regions with bounding boxes and base64 data:
{"index": 0,"markdown": "Extracted text...","images": [{"id": "img-0.jpeg","top_left_x": 50,"top_left_y": 100,"bottom_right_x": 500,"bottom_right_y": 400,"image_base64": "data:image/jpeg;base64,/9j/4AAQ...","dimensions": {"width": 450,"height": 300}}]}
With document_annotation_prompt
When a document_annotation_prompt is provided, the response includes a document_annotation field:
{"pages": ["..."],"document_annotation": "{\"title\": \"Invoice #1234\", \"total\": \"$1,250.00\"}"}
document_annotation is returned as a JSON string, not a parsed object.
Page Selection
For multi-page PDFs, use the pages parameter to process specific pages. This reduces processing time and token usage.
"pages": 0 // Single page (zero-based index)"pages": "0-2" // Page range"pages": [0, 3, 4] // Explicit list
Must be non-negative integers. Lists cannot exceed 1000 items.
Custom Prompts
prompt vs document_annotation_prompt
Both fields control what the model extracts, but they differ in how the result is returned:
| Field | Behavior |
|---|---|
prompt | Guides the extraction. Output appears in pages[].markdown. |
document_annotation_prompt | Same as prompt, but also populates document_annotation in the response. Mistral SDK compatible. |
Precedence: document_annotation_prompt > prompt > default ("Read all text in this image and format it as markdown.")
- Python
- cURL
payload = {"model": "zai-org/GLM-OCR","document": {"type": "document_url","document_url": "https://example.com/receipt.png"},"document_annotation_prompt": "Extract the total amount and date from this receipt"}response = httpx.post(url, json=payload, headers=headers, timeout=120.0)data = response.json()print(data["document_annotation"]) # structured extractionfor page in data["pages"]:print(page["markdown"]) # full text extraction
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/receipt.png"},"document_annotation_prompt": "Extract the total amount and date from this receipt"}'
Supported Formats
| Format | MIME Type |
|---|---|
application/pdf | |
| PNG | image/png |
| JPEG/JPG | image/jpeg |
For best results, use high-resolution images (at least 300 DPI) with clear, readable text.
Models
| Model | Parameters | Precision | Best For |
|---|---|---|---|
deepseek-ai/DeepSeek-OCR | 3B | bfloat16 | General-purpose OCR, multilingual text (100+ languages) |
lightonai/LightOnOCR-2-1B | 1B | bfloat16 | Speed and throughput, bounding box detection, French/arXiv documents |
zai-org/GLM-OCR | 0.9B | bfloat16 | Structured extraction via JSON schema, complex tables and formulas, edge deployment |
Choosing a Model
- DeepSeek-OCR: Best overall accuracy for general documents, especially multilingual content.
- LightOnOCR-2: Fastest option (~5.7 pages/s on a single H100). Supports image bounding boxes via
include_image_base64. Strong on arXiv and scanned documents. - GLM-OCR: Smallest and most efficient. Supports structured extraction via JSON schema prompts with
document_annotation_prompt. Strong on tables and formula recognition.
All models support the same API endpoints, request parameters, and response format. Swap the model field to switch between them.
Advanced: Signed URL Flow
If you need to use /v1/ocr with files uploaded via the Files API (e.g. for Mistral SDK compatibility), you can get a presigned download URL and pass it as a document_url:
- Python
- cURL
import httpxAPI_URL = "https://api.inference.nebul.io"HEADERS = {"Authorization": f"Bearer <YOUR_API_KEY>"}# 1. Uploadwith open("document.pdf", "rb") as f:file_id = httpx.post(f"{API_URL}/v1/files",files={"file": ("document.pdf", f, "application/pdf")},data={"purpose": "ocr"},headers=HEADERS,).json()["id"]# 2. Get signed URLsigned_url = httpx.get(f"{API_URL}/v1/files/{file_id}/url",headers=HEADERS,).json()["url"]# 3. OCR with document_urlresult = httpx.post(f"{API_URL}/v1/ocr",json={"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": signed_url,},},headers={**HEADERS, "Content-Type": "application/json"},timeout=120.0,).json()# Cleanuphttpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)
# 1. UploadFILE_ID=$(curl -s -X POST https://api.inference.nebul.io/v1/files \-H "Authorization: Bearer <YOUR_API_KEY>" \-F "file=@document.pdf" \-F "purpose=ocr" | jq -r '.id')# 2. Get signed URLSIGNED_URL=$(curl -s https://api.inference.nebul.io/v1/files/$FILE_ID/url \-H "Authorization: Bearer <YOUR_API_KEY>" | jq -r '.url')# 3. OCR with signed URLcurl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d "{\"model\": \"deepseek-ai/DeepSeek-OCR\",\"document\": {\"type\": \"document_url\",\"document_url\": \"$SIGNED_URL\"}}"
Errors
| Status | Cause |
|---|---|
| 400 | Invalid input: unsupported format, missing document, bad file_id |
| 404 | File not found |
| 422 | Request validation error |
| 500 | OCR processing failed |
| 503 | OCR service unavailable |
Best Practices
- Use
/files/{id}/ocrwhen possible. It's simpler and avoids managing signed URLs. - Select specific pages for multi-page PDFs to reduce latency and cost.
- Use
document_annotation_promptwhen you need structured extraction (e.g. invoice fields) alongside the full text. - Delete files after processing with
DELETE /v1/files/{file_id}to avoid unnecessary storage. - Ensure image quality: at least 300 DPI with proper orientation and high contrast.