Skip to main content

Introduction

The Nebul Inference API provides secure, low-latency access to open-weight AI models through an OpenAI-compatible REST API. Run inference on models like Mistral, Qwen3, and more-without managing GPU infrastructure.

What is the Inference API?

A managed inference service that hosts state-of-the-art open-weight models on dedicated GPU clusters. You get:

  • OpenAI SDK compatibility — Drop-in replacement for OpenAI client libraries
  • Open-weight models — Access to Llama, Qwen, DeepSeek, Mistral, and more
  • Multi-modal support — Text, vision, embeddings, speech, and image generation
  • Privacy-first — Your data never trains models; requests are processed in isolated environments
  • Low latency — Optimized inference on multi GPU nodes across EU data centers

API Endpoint

https://api.inference.nebul.io/v1

Authenticate with your API key via the Authorization: Bearer <key> header.

Documentation Guide

SectionWhat you'll find
Quick StartMake your first LLM request
ExamplesCode samples for embeddings, vision, function calling, and reasoning
ModelsExplore available models by capability
Advanced TopicsFunction calling, structured output, rate limits
API ReferenceFull OpenAPI specification