Skip to main content

OCR (Optical Character Recognition)

PREVIEW MODE

This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the OCR model is available upon request.

OCR models enable developers to extract text from images and PDF documents. This capability is powered by DeepSeek OCR, a state-of-the-art vision-language model that supports multilingual text extraction from various image formats and PDF documents.

Overview

The OCR API allows you to extract text from images (JPG, PNG, BMP, GIF) and PDF documents. It supports multiple input methods including image URLs, base64-encoded images, and PDF URLs. The API uses the MistralAI OCR format and can process single images or multi-page PDF documents.

Use Cases

OCR enables document digitization (legal documents, medical records, academic papers, business documents), data extraction (forms, receipts, invoices, ID cards), content processing (social media, screenshots, handwritten notes, signs and labels), and multilingual support (international documents, multilingual content, translation workflows).

Quick Start

Endpoint

POST https://api.inference.nebul.io/v1/ocr

The endpoint supports the default MistralAI OCR format.

Parameters

ParameterTypeRequiredDescription
modelStringYesThe model ID to use (e.g., deepseek-ai/DeepSeek-OCR).
documentObjectYesDocument specification. Must have type field set to "document_url", "content", or "file_id". For document_url type, provide document_url field. For content type, provide document_content field with base64-encoded data.
promptStringNoOptional prompt to guide OCR extraction. Defaults to "Read all text in this image and format it as markdown." if not provided.
pagesArrayNoSpecific pages to process (optional).
include_image_base64BooleanNoWhether to include base64-encoded images in the response. Defaults to false.

Code Examples

Using an image URL:

python
12345678910111213141516171819202122232425
import requests
url = "https://api.inference.nebul.io/v1/ocr"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
# Using image URL
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/image.png"
},
"prompt": "Read all text in this image and format it as markdown."
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
print("Extracted text:")
for page in data["pages"]:
print(f"Page {page['index']}:")
print(page["markdown"])

Using base64-encoded image:

python
1234567891011121314151617181920212223242526272829
import requests
import base64
url = "https://api.inference.nebul.io/v1/ocr"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
# Read image and encode to base64
with open("image.png", "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode("utf-8")
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "content",
"document_content": image_data
},
"prompt": "Read all text in this image and format it as markdown."
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
print("Extracted text:")
for page in data["pages"]:
print(f"Page {page['index']}:")
print(page["markdown"])

Processing a PDF document:

python
12345678910111213141516171819202122232425
import requests
url = "https://api.inference.nebul.io/v1/ocr"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
# Process PDF from URL
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/document.pdf"
},
"prompt": "Read all text in this document and format it as markdown."
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
print("Extracted text from PDF:")
for page in data["pages"]:
print(f"\n[Page {page['index'] + 1}]")
print(page["markdown"])
tip

Image Formats: Supported image formats include JPG, PNG, BMP, and GIF. For best results, use high-resolution images (at least 300 DPI) with clear, readable text.

tip

Base64 Encoding: When using base64-encoded images, ensure the data is properly encoded and includes the appropriate data URI prefix if needed. The document_content field should contain only the base64 string without the data:image/png;base64, prefix.

tip

PDF Processing: For multi-page PDFs, the API processes all pages by default. Use the pages parameter to process specific pages, which can reduce processing time and costs for large documents.

Response Format

The API returns a JSON object in MistralAI OCR format:

json
1234567891011121314151617181920
{
"id": "ocr-1731500000",
"model": "deepseek-ai/DeepSeek-OCR",
"pages": [
{
"index": 0,
"markdown": "Extracted text from the image or document in markdown format...",
"images": [],
"dimensions": {
"width": 1920,
"height": 1080,
"dpi": 200
}
}
],
"usage_info": {
"pages_processed": 1,
"doc_size_bytes": 245760
}
}

Response Fields

FieldTypeDescription
idStringUnique identifier for the OCR request.
modelStringThe model ID used for OCR.
pagesArrayList of processed pages.
pages[].indexIntegerZero-based index of the page.
pages[].markdownStringExtracted text from the page in markdown format.
pages[].imagesArrayList of extracted images from the page (if any).
pages[].dimensionsObjectPage dimensions with width, height, and dpi fields.
usage_infoObjectUsage information.
usage_info.pages_processedIntegerNumber of pages processed.
usage_info.doc_size_bytesIntegerSize of the document in bytes.

Advanced Usage

Custom OCR Prompts

You can provide custom prompts to guide the OCR extraction:

python
1234567891011121314
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/receipt.png"
},
"prompt": "Extract only the numbers and dates from this image. Format them as a list."
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
for page in data["pages"]:
print(page["markdown"])

PDF Processing with Multiple Pages

When processing PDFs, the API automatically processes each page separately:

python
1234567891011121314151617
payload = {
"model": "deepseek-ai/DeepSeek-OCR",
"document": {
"type": "document_url",
"document_url": "https://example.com/multi-page-document.pdf"
}
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
print(f"Processed {len(data['pages'])} pages")
for page in data["pages"]:
print(f"\n=== Page {page['index'] + 1} ===")
print(page["markdown"])
if page.get("dimensions"):
print(f"Dimensions: {page['dimensions']['width']}x{page['dimensions']['height']} @ {page['dimensions']['dpi']} DPI")

Supported Formats

Image Formats

  • JPG/JPEG: Standard JPEG images
  • PNG: Portable Network Graphics
  • BMP: Bitmap images
  • GIF: Graphics Interchange Format

Document Formats

  • PDF: Portable Document Format (multi-page support)

Model Specifications

The following OCR models are available:

  • deepseek-ai/DeepSeek-OCR - 3B parameters, 8K context, bfloat16 precision, supports Text, Image, Multilingual (100+ languages) (Preview)

Best Practices

  1. Image Quality: Higher resolution images generally produce better OCR results. Aim for at least 300 DPI for scanned documents.

  2. Image Preprocessing:

    • Ensure images are properly oriented (not rotated)
    • Use high contrast between text and background
    • Remove noise and artifacts when possible
  3. PDF Processing:

    • PDFs are automatically split into pages
    • Each page is processed separately and returned in the pages array
    • Page dimensions and metadata are included for each page
  4. Prompt Engineering:

    • Use specific prompts to guide extraction (e.g., "Extract only the table data")
    • The default prompt formats output as markdown, which is useful for structured text
    • Use prompts to specify output format or structure requirements
  5. Base64 Encoding:

    • When using base64, set type to "content" and provide the data in the document_content field
    • Ensure images are properly encoded without corruption
    • No data URI prefix is needed (just the base64 string)
    • Alternatively, you can use type: "document_url" with a data URL like data:image/png;base64,<data> in the document_url field
  6. Error Handling:

    • Handle cases where no text is extracted (empty pages array)
    • Validate image URLs before sending requests
    • Check PDF accessibility and password protection
  7. Rate Limiting:

    • Be mindful of API rate limits
    • Batch process multiple images when possible
    • Consider async processing for large document sets
  8. Cost Optimization:

    • Process only necessary pages from multi-page PDFs
    • Use appropriate image resolutions (not unnecessarily high)
    • Cache results for repeated processing of the same images