Skip to main content

OCR (Optical Character Recognition)

PREVIEW MODE

This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the OCR model is available upon request.

OCR models extract text from images and PDF documents. Through the Nebul Inference API, you can get access to several language-vision model specialized in extracting information from images or documents into text in multilingual contexts.

Quick Start

The simplest flow: upload your file, then run OCR on it.

python
1234567891011121314151617181920212223242526272829
import httpx
API_URL = "https://api.inference.nebul.io"
API_KEY = "<YOUR_API_KEY>"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
# Step 1: Upload
with open("document.pdf", "rb") as f:
upload = httpx.post(
f"{API_URL}/v1/files",
files={"file": ("document.pdf", f, "application/pdf")},
data={"purpose": "ocr"},
headers=HEADERS,
)
file_id = upload.json()["id"]
# Step 2: OCR
result = httpx.post(
f"{API_URL}/v1/files/{file_id}/ocr",
json={"model": "deepseek-ai/DeepSeek-OCR"},
headers={**HEADERS, "Content-Type": "application/json"},
timeout=120.0,
).json()
for page in result["pages"]:
print(f"Page {page['index']}: {page['markdown'][:100]}...")
# Cleanup
httpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)

Direct: /v1/ocr with a URL

If you already have a publicly accessible URL, you can call the OCR endpoint directly:

python
123456789101112131415161718
import httpx
url = "https://api.inference.nebul.io/v1/ocr"
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/image.png"
}
}
response = httpx.post(url, json=payload, headers={
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}, timeout=120.0)
for page in response.json()["pages"]:
print(page["markdown"])

Endpoints

POST /v1/files/{file_id}/ocr: Upload + OCR

Upload a file first (see Uploading Files), then OCR it by file ID. The service resolves the file internally, no signed URLs to manage.

Request body (FileOCROptions):

ParameterTypeRequiredDescription
modelStringNoModel identifier (e.g. lightonai/LightOnOCR-2-1B). See Models.
pagesString | Integer | ArrayNoPages to process for multi-page PDFs. See Page Selection.
include_image_base64BooleanNoInclude base64-encoded page images in response. Defaults to false.
promptStringNoCustom OCR prompt. Defaults to "Read all text in this image and format it as markdown.".
document_annotation_promptStringNoMistral SDK-compatible prompt. Populates document_annotation in the response.
skip_special_tokensBooleanNoStrip special tokens (e.g. <|ref|>) from output.

POST /v1/ocr: Direct OCR (Mistral SDK compatible)

Call the OCR service directly. You supply the document yourself.

Request body (MistralOCRRequest):

ParameterTypeRequiredDescription
modelStringNoModel identifier (e.g. zai-org/GLM-OCR). See Models.
documentObjectYesDocument specification. See Document Input Types.
pagesString | Integer | ArrayNoPages to process for multi-page PDFs.
include_image_base64BooleanNoInclude base64-encoded page images. Defaults to false.
promptStringNoCustom OCR prompt.
document_annotation_promptStringNoMistral SDK-compatible prompt. Populates document_annotation in the response.
skip_special_tokensBooleanNoStrip special tokens from output.

Uploading Files

Before using /v1/files/{file_id}/ocr, upload your file:

12
POST /v1/files
Content-Type: multipart/form-data
FieldTypeRequiredDescription
fileFileYesThe document to upload (PDF, PNG, or JPEG).
purposeStringYesMust be "ocr".
python
12345678
with open("invoice.pdf", "rb") as f:
upload = httpx.post(
f"{API_URL}/v1/files",
files={"file": ("invoice.pdf", f, "application/pdf")},
data={"purpose": "ocr"},
headers=HEADERS,
)
file_id = upload.json()["id"]

The response includes an id field you use in the OCR request:

json
1234567
{
"id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b",
"filename": "invoice.pdf",
"bytes": 245678,
"purpose": "ocr",
"status": "complete"
}
tip

See the Files API reference for full upload, list, download, and delete endpoints, plus multipart uploads for large files.

Document Input Types

When calling /v1/ocr directly, the document field supports three input types:

document_url: URL or signed URL

Pass a publicly accessible URL to the document:

json
1234
{
"type": "document_url",
"document_url": "https://example.com/document.pdf"
}

Also accepts data: URIs (data:application/pdf;base64,...).

content: Inline base64

Embed the document directly in the request, no upload needed:

json
1234
{
"type": "content",
"document_content": "data:application/pdf;base64,JVBERi0xLjQK..."
}

file_id: Reference an uploaded file

Use a file ID from a previous upload (no signed URL required):

json
12345
{
"type": "file_id",
"file_id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b",
"org_id": "0192a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b"
}

When using file_id, the OCR service resolves the file internally. Provide org_id here or via the X-Org-Id header.

Response Format

All OCR endpoints return the same response shape:

json
12345678910111213141516171819202122232425262728
{
"id": "ocr-1731500000",
"model": "deepseek-ai/DeepSeek-OCR",
"pages": [
{
"index": 0,
"markdown": "Extracted text from the image in markdown format...",
"images": [],
"dimensions": {
"width": 595,
"height": 842,
"dpi": 72
}
}
],
"usage_info": {
"pages_processed": 1,
"doc_size_bytes": 245678,
"prompt_tokens": 1200,
"completion_tokens": 3400,
"total_tokens": 4600
},
"usage": {
"prompt_tokens": 1200,
"completion_tokens": 3400,
"total_tokens": 4600
}
}

Response Fields

FieldTypeDescription
idStringUnique identifier for the OCR request.
modelStringModel ID used for OCR.
pagesArrayList of processed pages.
pages[].indexIntegerZero-based page index.
pages[].markdownStringExtracted text in markdown format.
pages[].imagesArrayExtracted images (populated when include_image_base64 is true).
pages[].dimensionsObjectPage dimensions with width, height, and dpi.
usage_info.pages_processedIntegerNumber of pages processed.
usage_info.doc_size_bytesIntegerDocument size in bytes.
usage_info.prompt_tokensIntegerInput tokens used.
usage_info.completion_tokensIntegerOutput tokens generated.
usage_info.total_tokensIntegerTotal tokens.
usageObjectToken usage in OpenAI-compatible format.

With include_image_base64

When include_image_base64 is true, each page's images array contains detected image regions with bounding boxes and base64 data:

json
123456789101112131415161718
{
"index": 0,
"markdown": "Extracted text...",
"images": [
{
"id": "img-0.jpeg",
"top_left_x": 50,
"top_left_y": 100,
"bottom_right_x": 500,
"bottom_right_y": 400,
"image_base64": "data:image/jpeg;base64,/9j/4AAQ...",
"dimensions": {
"width": 450,
"height": 300
}
}
]
}

With document_annotation_prompt

When a document_annotation_prompt is provided, the response includes a document_annotation field:

json
1234
{
"pages": ["..."],
"document_annotation": "{\"title\": \"Invoice #1234\", \"total\": \"$1,250.00\"}"
}

document_annotation is returned as a JSON string, not a parsed object.

Page Selection

For multi-page PDFs, use the pages parameter to process specific pages. This reduces processing time and token usage.

json
123
"pages": 0 // Single page (zero-based index)
"pages": "0-2" // Page range
"pages": [0, 3, 4] // Explicit list

Must be non-negative integers. Lists cannot exceed 1000 items.

Custom Prompts

prompt vs document_annotation_prompt

Both fields control what the model extracts, but they differ in how the result is returned:

FieldBehavior
promptGuides the extraction. Output appears in pages[].markdown.
document_annotation_promptSame as prompt, but also populates document_annotation in the response. Mistral SDK compatible.

Precedence: document_annotation_prompt > prompt > default ("Read all text in this image and format it as markdown.")

python
123456789101112131415
payload = {
"model": "zai-org/GLM-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/receipt.png"
},
"document_annotation_prompt": "Extract the total amount and date from this receipt"
}
response = httpx.post(url, json=payload, headers=headers, timeout=120.0)
data = response.json()
print(data["document_annotation"]) # structured extraction
for page in data["pages"]:
print(page["markdown"]) # full text extraction

Supported Formats

FormatMIME Type
PDFapplication/pdf
PNGimage/png
JPEG/JPGimage/jpeg
tip

For best results, use high-resolution images (at least 300 DPI) with clear, readable text.

Models

ModelParametersPrecisionBest For
deepseek-ai/DeepSeek-OCR3Bbfloat16General-purpose OCR, multilingual text (100+ languages)
lightonai/LightOnOCR-2-1B1Bbfloat16Speed and throughput, bounding box detection, French/arXiv documents
zai-org/GLM-OCR0.9Bbfloat16Structured extraction via JSON schema, complex tables and formulas, edge deployment

Choosing a Model

  • DeepSeek-OCR: Best overall accuracy for general documents, especially multilingual content.
  • LightOnOCR-2: Fastest option (~5.7 pages/s on a single H100). Supports image bounding boxes via include_image_base64. Strong on arXiv and scanned documents.
  • GLM-OCR: Smallest and most efficient. Supports structured extraction via JSON schema prompts with document_annotation_prompt. Strong on tables and formula recognition.
info

All models support the same API endpoints, request parameters, and response format. Swap the model field to switch between them.

Advanced: Signed URL Flow

If you need to use /v1/ocr with files uploaded via the Files API (e.g. for Mistral SDK compatibility), you can get a presigned download URL and pass it as a document_url:

python
123456789101112131415161718192021222324252627282930313233343536
import httpx
API_URL = "https://api.inference.nebul.io"
HEADERS = {"Authorization": f"Bearer <YOUR_API_KEY>"}
# 1. Upload
with open("document.pdf", "rb") as f:
file_id = httpx.post(
f"{API_URL}/v1/files",
files={"file": ("document.pdf", f, "application/pdf")},
data={"purpose": "ocr"},
headers=HEADERS,
).json()["id"]
# 2. Get signed URL
signed_url = httpx.get(
f"{API_URL}/v1/files/{file_id}/url",
headers=HEADERS,
).json()["url"]
# 3. OCR with document_url
result = httpx.post(
f"{API_URL}/v1/ocr",
json={
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": signed_url,
},
},
headers={**HEADERS, "Content-Type": "application/json"},
timeout=120.0,
).json()
# Cleanup
httpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)

Errors

StatusCause
400Invalid input: unsupported format, missing document, bad file_id
404File not found
422Request validation error
500OCR processing failed
503OCR service unavailable

Best Practices

  1. Use /files/{id}/ocr when possible. It's simpler and avoids managing signed URLs.
  2. Select specific pages for multi-page PDFs to reduce latency and cost.
  3. Use document_annotation_prompt when you need structured extraction (e.g. invoice fields) alongside the full text.
  4. Delete files after processing with DELETE /v1/files/{file_id} to avoid unnecessary storage.
  5. Ensure image quality: at least 300 DPI with proper orientation and high contrast.