OCR (Optical Character Recognition)
This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the OCR model is available upon request.
OCR models enable developers to extract text from images and PDF documents. This capability is powered by DeepSeek OCR, a state-of-the-art vision-language model that supports multilingual text extraction from various image formats and PDF documents.
Overview
The OCR API allows you to extract text from images (JPG, PNG, BMP, GIF) and PDF documents. It supports multiple input methods including image URLs, base64-encoded images, and PDF URLs. The API uses the MistralAI OCR format and can process single images or multi-page PDF documents.
Use Cases
OCR enables document digitization (legal documents, medical records, academic papers, business documents), data extraction (forms, receipts, invoices, ID cards), content processing (social media, screenshots, handwritten notes, signs and labels), and multilingual support (international documents, multilingual content, translation workflows).
Quick Start
Endpoint
POST https://api.inference.nebul.io/v1/ocr
The endpoint supports the default MistralAI OCR format.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | String | Yes | The model ID to use (e.g., deepseek-ai/DeepSeek-OCR). |
document | Object | Yes | Document specification. Must have type field set to "document_url", "content", or "file_id". For document_url type, provide document_url field. For content type, provide document_content field with base64-encoded data. |
prompt | String | No | Optional prompt to guide OCR extraction. Defaults to "Read all text in this image and format it as markdown." if not provided. |
pages | Array | No | Specific pages to process (optional). |
include_image_base64 | Boolean | No | Whether to include base64-encoded images in the response. Defaults to false. |
Code Examples
- Python
- cURL
Using an image URL:
import requestsurl = "https://api.inference.nebul.io/v1/ocr"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}# Using image URLpayload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/image.png"},"prompt": "Read all text in this image and format it as markdown."}response = requests.post(url, headers=headers, json=payload)data = response.json()print("Extracted text:")for page in data["pages"]:print(f"Page {page['index']}:")print(page["markdown"])
Using base64-encoded image:
import requestsimport base64url = "https://api.inference.nebul.io/v1/ocr"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}# Read image and encode to base64with open("image.png", "rb") as image_file:image_data = base64.b64encode(image_file.read()).decode("utf-8")payload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "content","document_content": image_data},"prompt": "Read all text in this image and format it as markdown."}response = requests.post(url, headers=headers, json=payload)data = response.json()print("Extracted text:")for page in data["pages"]:print(f"Page {page['index']}:")print(page["markdown"])
Processing a PDF document:
import requestsurl = "https://api.inference.nebul.io/v1/ocr"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}# Process PDF from URLpayload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/document.pdf"},"prompt": "Read all text in this document and format it as markdown."}response = requests.post(url, headers=headers, json=payload)data = response.json()print("Extracted text from PDF:")for page in data["pages"]:print(f"\n[Page {page['index'] + 1}]")print(page["markdown"])
Using an image URL:
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/image.png"},"prompt": "Read all text in this image and format it as markdown."}'
Using base64-encoded image:
# First, encode the image to base64IMAGE_BASE64=$(base64 -i image.png)curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d "{\"model\": \"deepseek-ai/DeepSeek-OCR\",\"document\": {\"type\": \"content\",\"document_content\": \"${IMAGE_BASE64}\"},\"prompt\": \"Read all text in this image and format it as markdown.\"}"
Processing a PDF document:
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/document.pdf"},"prompt": "Read all text in this document and format it as markdown."}'
Image Formats: Supported image formats include JPG, PNG, BMP, and GIF. For best results, use high-resolution images (at least 300 DPI) with clear, readable text.
Base64 Encoding: When using base64-encoded images, ensure the data is properly encoded and includes the appropriate data URI prefix if needed. The document_content field should contain only the base64 string without the data:image/png;base64, prefix.
PDF Processing: For multi-page PDFs, the API processes all pages by default. Use the pages parameter to process specific pages, which can reduce processing time and costs for large documents.
Response Format
The API returns a JSON object in MistralAI OCR format:
{"id": "ocr-1731500000","model": "deepseek-ai/DeepSeek-OCR","pages": [{"index": 0,"markdown": "Extracted text from the image or document in markdown format...","images": [],"dimensions": {"width": 1920,"height": 1080,"dpi": 200}}],"usage_info": {"pages_processed": 1,"doc_size_bytes": 245760}}
Response Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique identifier for the OCR request. |
model | String | The model ID used for OCR. |
pages | Array | List of processed pages. |
pages[].index | Integer | Zero-based index of the page. |
pages[].markdown | String | Extracted text from the page in markdown format. |
pages[].images | Array | List of extracted images from the page (if any). |
pages[].dimensions | Object | Page dimensions with width, height, and dpi fields. |
usage_info | Object | Usage information. |
usage_info.pages_processed | Integer | Number of pages processed. |
usage_info.doc_size_bytes | Integer | Size of the document in bytes. |
Advanced Usage
Custom OCR Prompts
You can provide custom prompts to guide the OCR extraction:
- Python
- cURL
payload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/receipt.png"},"prompt": "Extract only the numbers and dates from this image. Format them as a list."}response = requests.post(url, headers=headers, json=payload)data = response.json()for page in data["pages"]:print(page["markdown"])
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/receipt.png"},"prompt": "Extract only the numbers and dates from this image. Format them as a list."}'
PDF Processing with Multiple Pages
When processing PDFs, the API automatically processes each page separately:
- Python
- cURL
payload = {"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/multi-page-document.pdf"}}response = requests.post(url, headers=headers, json=payload)data = response.json()print(f"Processed {len(data['pages'])} pages")for page in data["pages"]:print(f"\n=== Page {page['index'] + 1} ===")print(page["markdown"])if page.get("dimensions"):print(f"Dimensions: {page['dimensions']['width']}x{page['dimensions']['height']} @ {page['dimensions']['dpi']} DPI")
curl -X POST https://api.inference.nebul.io/v1/ocr \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "deepseek-ai/DeepSeek-OCR","document": {"type": "document_url","document_url": "https://example.com/multi-page-document.pdf"}}'
Supported Formats
Image Formats
- JPG/JPEG: Standard JPEG images
- PNG: Portable Network Graphics
- BMP: Bitmap images
- GIF: Graphics Interchange Format
Document Formats
- PDF: Portable Document Format (multi-page support)
Model Specifications
The following OCR models are available:
deepseek-ai/DeepSeek-OCR- 3B parameters, 8K context, bfloat16 precision, supports Text, Image, Multilingual (100+ languages) (Preview)
Best Practices
-
Image Quality: Higher resolution images generally produce better OCR results. Aim for at least 300 DPI for scanned documents.
-
Image Preprocessing:
- Ensure images are properly oriented (not rotated)
- Use high contrast between text and background
- Remove noise and artifacts when possible
-
PDF Processing:
- PDFs are automatically split into pages
- Each page is processed separately and returned in the
pagesarray - Page dimensions and metadata are included for each page
-
Prompt Engineering:
- Use specific prompts to guide extraction (e.g., "Extract only the table data")
- The default prompt formats output as markdown, which is useful for structured text
- Use prompts to specify output format or structure requirements
-
Base64 Encoding:
- When using base64, set
typeto"content"and provide the data in thedocument_contentfield - Ensure images are properly encoded without corruption
- No data URI prefix is needed (just the base64 string)
- Alternatively, you can use
type: "document_url"with a data URL likedata:image/png;base64,<data>in thedocument_urlfield
- When using base64, set
-
Error Handling:
- Handle cases where no text is extracted (empty
pagesarray) - Validate image URLs before sending requests
- Check PDF accessibility and password protection
- Handle cases where no text is extracted (empty
-
Rate Limiting:
- Be mindful of API rate limits
- Batch process multiple images when possible
- Consider async processing for large document sets
-
Cost Optimization:
- Process only necessary pages from multi-page PDFs
- Use appropriate image resolutions (not unnecessarily high)
- Cache results for repeated processing of the same images