OCR (Optical Character Recognition)

PREVIEW MODE

This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the OCR model is available upon request.

OCR models extract text from images and PDF documents. Through the Nebul Inference API, you can get access to several language-vision model specialized in extracting information from images or documents into text in multilingual contexts.

Quick Start

Recommended: Upload then OCR

The simplest flow: upload your file, then run OCR on it.

Python
cURL

python
import httpx

API_URL = "https://api.inference.nebul.io"
API_KEY = "<YOUR_API_KEY>"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Step 1: Upload
with open("document.pdf", "rb") as f:
    upload = httpx.post(
        f"{API_URL}/v1/files",
        files={"file": ("document.pdf", f, "application/pdf")},
        data={"purpose": "ocr"},
        headers=HEADERS,
    )
file_id = upload.json()["id"]

# Step 2: OCR
result = httpx.post(
    f"{API_URL}/v1/files/{file_id}/ocr",
    json={"model": "deepseek-ai/DeepSeek-OCR"},
    headers={**HEADERS, "Content-Type": "application/json"},
    timeout=120.0,
).json()

for page in result["pages"]:
    print(f"Page {page['index']}: {page['markdown'][:100]}...")

# Cleanup
httpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)

bash
# Step 1: Upload
FILE_ID=$(curl -s -X POST https://api.inference.nebul.io/v1/files \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "file=@document.pdf" \
  -F "purpose=ocr" | jq -r '.id')

# Step 2: OCR
curl -X POST https://api.inference.nebul.io/v1/files/$FILE_ID/ocr \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-ai/DeepSeek-OCR"}'

Direct: `/v1/ocr` with a URL

If you already have a publicly accessible URL, you can call the OCR endpoint directly:

Python
cURL

python
import httpx

url = "https://api.inference.nebul.io/v1/ocr"
payload = {
    "model": "deepseek-ai/DeepSeek-OCR",
    "document": {
        "type": "document_url",
        "document_url": "https://example.com/image.png"
    }
}

response = httpx.post(url, json=payload, headers={
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}, timeout=120.0)

for page in response.json()["pages"]:
    print(page["markdown"])

bash
curl -X POST https://api.inference.nebul.io/v1/ocr \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-OCR",
    "document": {
      "type": "document_url",
      "document_url": "https://example.com/image.png"
    }
  }'

Endpoints

`POST /v1/files/{file_id}/ocr`: Upload + OCR

Upload a file first (see Uploading Files), then OCR it by file ID. The service resolves the file internally, no signed URLs to manage.

Request body (FileOCROptions):

Parameter	Type	Required	Description
`model`	String	No	Model identifier (e.g. `lightonai/LightOnOCR-2-1B`). See Models.
`pages`	String \| Integer \| Array	No	Pages to process for multi-page PDFs. See Page Selection.
`include_image_base64`	Boolean	No	Include base64-encoded page images in response. Defaults to `false`.
`prompt`	String	No	Custom OCR prompt. Defaults to `"Read all text in this image and format it as markdown."`.
`document_annotation_prompt`	String	No	Mistral SDK-compatible prompt. Populates `document_annotation` in the response.
`skip_special_tokens`	Boolean	No	Strip special tokens (e.g. `<\|ref\|>`) from output.

`POST /v1/ocr`: Direct OCR (Mistral SDK compatible)

Call the OCR service directly. You supply the document yourself.

Request body (MistralOCRRequest):

Parameter	Type	Required	Description
`model`	String	No	Model identifier (e.g. `zai-org/GLM-OCR`). See Models.
`document`	Object	Yes	Document specification. See Document Input Types.
`pages`	String \| Integer \| Array	No	Pages to process for multi-page PDFs.
`include_image_base64`	Boolean	No	Include base64-encoded page images. Defaults to `false`.
`prompt`	String	No	Custom OCR prompt.
`document_annotation_prompt`	String	No	Mistral SDK-compatible prompt. Populates `document_annotation` in the response.
`skip_special_tokens`	Boolean	No	Strip special tokens from output.

Uploading Files

Before using /v1/files/{file_id}/ocr, upload your file:

POST /v1/files
Content-Type: multipart/form-data

Field	Type	Required	Description
`file`	File	Yes	The document to upload (PDF, PNG, or JPEG).
`purpose`	String	Yes	Must be `"ocr"`.

Python
cURL

python
with open("invoice.pdf", "rb") as f:
    upload = httpx.post(
        f"{API_URL}/v1/files",
        files={"file": ("invoice.pdf", f, "application/pdf")},
        data={"purpose": "ocr"},
        headers=HEADERS,
    )
file_id = upload.json()["id"]

bash
curl -X POST https://api.inference.nebul.io/v1/files \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "file=@invoice.pdf" \
  -F "purpose=ocr"

The response includes an id field you use in the OCR request:

json
{
  "id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b",
  "filename": "invoice.pdf",
  "bytes": 245678,
  "purpose": "ocr",
  "status": "complete"
}

tip

See the Files API reference for full upload, list, download, and delete endpoints, plus multipart uploads for large files.

Document Input Types

When calling /v1/ocr directly, the document field supports three input types:

`document_url`: URL or signed URL

Pass a publicly accessible URL to the document:

json
{
  "type": "document_url",
  "document_url": "https://example.com/document.pdf"
}

Also accepts data: URIs (data:application/pdf;base64,...).

`content`: Inline base64

Embed the document directly in the request, no upload needed:

json
{
  "type": "content",
  "document_content": "data:application/pdf;base64,JVBERi0xLjQK..."
}

`file_id`: Reference an uploaded file

Use a file ID from a previous upload (no signed URL required):

json
{
  "type": "file_id",
  "file_id": "0193a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b",
  "org_id": "0192a1b2-c3d4-7e5f-8a9b-0c1d2e3f4a5b"
}

When using file_id, the OCR service resolves the file internally. Provide org_id here or via the X-Org-Id header.

Response Format

All OCR endpoints return the same response shape:

json
{
  "id": "ocr-1731500000",
  "model": "deepseek-ai/DeepSeek-OCR",
  "pages": [
    {
      "index": 0,
      "markdown": "Extracted text from the image in markdown format...",
      "images": [],
      "dimensions": {
        "width": 595,
        "height": 842,
        "dpi": 72
      }
    }
  ],
  "usage_info": {
    "pages_processed": 1,
    "doc_size_bytes": 245678,
    "prompt_tokens": 1200,
    "completion_tokens": 3400,
    "total_tokens": 4600
  },
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 3400,
    "total_tokens": 4600
  }
}

Response Fields

Field	Type	Description
`id`	String	Unique identifier for the OCR request.
`model`	String	Model ID used for OCR.
`pages`	Array	List of processed pages.
`pages[].index`	Integer	Zero-based page index.
`pages[].markdown`	String	Extracted text in markdown format.
`pages[].images`	Array	Extracted images (populated when `include_image_base64` is `true`).
`pages[].dimensions`	Object	Page dimensions with `width`, `height`, and `dpi`.
`usage_info.pages_processed`	Integer	Number of pages processed.
`usage_info.doc_size_bytes`	Integer	Document size in bytes.
`usage_info.prompt_tokens`	Integer	Input tokens used.
`usage_info.completion_tokens`	Integer	Output tokens generated.
`usage_info.total_tokens`	Integer	Total tokens.
`usage`	Object	Token usage in OpenAI-compatible format.

With `include_image_base64`

When include_image_base64 is true, each page's images array contains detected image regions with bounding boxes and base64 data:

json
{
  "index": 0,
  "markdown": "Extracted text...",
  "images": [
    {
      "id": "img-0.jpeg",
      "top_left_x": 50,
      "top_left_y": 100,
      "bottom_right_x": 500,
      "bottom_right_y": 400,
      "image_base64": "data:image/jpeg;base64,/9j/4AAQ...",
      "dimensions": {
        "width": 450,
        "height": 300
      }
    }
  ]
}

With `document_annotation_prompt`

When a document_annotation_prompt is provided, the response includes a document_annotation field:

json
{
  "pages": ["..."],
  "document_annotation": "{\"title\": \"Invoice #1234\", \"total\": \"$1,250.00\"}"
}

document_annotation is returned as a JSON string, not a parsed object.

Page Selection

For multi-page PDFs, use the pages parameter to process specific pages. This reduces processing time and token usage.

json
"pages": 0           // Single page (zero-based index)
"pages": "0-2"       // Page range
"pages": [0, 3, 4]   // Explicit list

Must be non-negative integers. Lists cannot exceed 1000 items.

Custom Prompts

`prompt` vs `document_annotation_prompt`

Both fields control what the model extracts, but they differ in how the result is returned:

Field	Behavior
`prompt`	Guides the extraction. Output appears in `pages[].markdown`.
`document_annotation_prompt`	Same as `prompt`, but also populates `document_annotation` in the response. Mistral SDK compatible.

Precedence: document_annotation_prompt > prompt > default ("Read all text in this image and format it as markdown.")

Python
cURL

python
payload = {
    "model": "zai-org/GLM-OCR",
    "document": {
        "type": "document_url",
        "document_url": "https://example.com/receipt.png"
    },
    "document_annotation_prompt": "Extract the total amount and date from this receipt"
}

response = httpx.post(url, json=payload, headers=headers, timeout=120.0)
data = response.json()

print(data["document_annotation"])  # structured extraction
for page in data["pages"]:
    print(page["markdown"])         # full text extraction

bash
curl -X POST https://api.inference.nebul.io/v1/ocr \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-OCR",
    "document": {
      "type": "document_url",
      "document_url": "https://example.com/receipt.png"
    },
    "document_annotation_prompt": "Extract the total amount and date from this receipt"
  }'

Supported Formats

Format	MIME Type
PDF	`application/pdf`
PNG	`image/png`
JPEG/JPG	`image/jpeg`

tip

For best results, use high-resolution images (at least 300 DPI) with clear, readable text.

Models

Model	Parameters	Precision	Best For
`deepseek-ai/DeepSeek-OCR`	3B	bfloat16	General-purpose OCR, multilingual text (100+ languages)
`lightonai/LightOnOCR-2-1B`	1B	bfloat16	Speed and throughput, bounding box detection, French/arXiv documents
`zai-org/GLM-OCR`	0.9B	bfloat16	Structured extraction via JSON schema, complex tables and formulas, edge deployment

Choosing a Model

DeepSeek-OCR: Best overall accuracy for general documents, especially multilingual content.
LightOnOCR-2: Fastest option (~5.7 pages/s on a single H100). Supports image bounding boxes via include_image_base64. Strong on arXiv and scanned documents.
GLM-OCR: Smallest and most efficient. Supports structured extraction via JSON schema prompts with document_annotation_prompt. Strong on tables and formula recognition.

info

All models support the same API endpoints, request parameters, and response format. Swap the model field to switch between them.

Advanced: Signed URL Flow

If you need to use /v1/ocr with files uploaded via the Files API (e.g. for Mistral SDK compatibility), you can get a presigned download URL and pass it as a document_url:

Python
cURL

python
import httpx

API_URL = "https://api.inference.nebul.io"
HEADERS = {"Authorization": f"Bearer <YOUR_API_KEY>"}

# 1. Upload
with open("document.pdf", "rb") as f:
    file_id = httpx.post(
        f"{API_URL}/v1/files",
        files={"file": ("document.pdf", f, "application/pdf")},
        data={"purpose": "ocr"},
        headers=HEADERS,
    ).json()["id"]

# 2. Get signed URL
signed_url = httpx.get(
    f"{API_URL}/v1/files/{file_id}/url",
    headers=HEADERS,
).json()["url"]

# 3. OCR with document_url
result = httpx.post(
    f"{API_URL}/v1/ocr",
    json={
        "model": "deepseek-ai/DeepSeek-OCR",
        "document": {
            "type": "document_url",
            "document_url": signed_url,
        },
    },
    headers={**HEADERS, "Content-Type": "application/json"},
    timeout=120.0,
).json()

# Cleanup
httpx.delete(f"{API_URL}/v1/files/{file_id}", headers=HEADERS)

bash
# 1. Upload
FILE_ID=$(curl -s -X POST https://api.inference.nebul.io/v1/files \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -F "file=@document.pdf" \
  -F "purpose=ocr" | jq -r '.id')

# 2. Get signed URL
SIGNED_URL=$(curl -s https://api.inference.nebul.io/v1/files/$FILE_ID/url \
  -H "Authorization: Bearer <YOUR_API_KEY>" | jq -r '.url')

# 3. OCR with signed URL
curl -X POST https://api.inference.nebul.io/v1/ocr \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"deepseek-ai/DeepSeek-OCR\",
    \"document\": {
      \"type\": \"document_url\",
      \"document_url\": \"$SIGNED_URL\"
    }
  }"

Errors

Status	Cause
400	Invalid input: unsupported format, missing document, bad `file_id`
404	File not found
422	Request validation error
500	OCR processing failed
503	OCR service unavailable

Best Practices

Use /files/{id}/ocr when possible. It's simpler and avoids managing signed URLs.
Select specific pages for multi-page PDFs to reduce latency and cost.
Use document_annotation_prompt when you need structured extraction (e.g. invoice fields) alongside the full text.
Delete files after processing with DELETE /v1/files/{file_id} to avoid unnecessary storage.
Ensure image quality: at least 300 DPI with proper orientation and high contrast.

Quick Start​

Recommended: Upload then OCR​

Direct: /v1/ocr with a URL​

Endpoints​

POST /v1/files/{file_id}/ocr: Upload + OCR​

POST /v1/ocr: Direct OCR (Mistral SDK compatible)​

Uploading Files​

Document Input Types​

document_url: URL or signed URL​

content: Inline base64​

file_id: Reference an uploaded file​

Response Format​

Response Fields​

With include_image_base64​

With document_annotation_prompt​

Page Selection​

Custom Prompts​

prompt vs document_annotation_prompt​

Supported Formats​

Models​

Choosing a Model​

Advanced: Signed URL Flow​

Errors​

Best Practices​