Introduction
The Nebul Inference API provides secure, low-latency access to open-weight AI models through an OpenAI-compatible REST API. Run inference on models like Mistral, Qwen3, and more-without managing GPU infrastructure.
What is the Inference API?
A managed inference service that hosts state-of-the-art open-weight models on dedicated GPU clusters. You get:
- OpenAI SDK compatibility — Drop-in replacement for OpenAI client libraries
- Open-weight models — Access to Llama, Qwen, DeepSeek, Mistral, and more
- Multi-modal support — Text, vision, embeddings, speech, and image generation
- Privacy-first — Your data never trains models; requests are processed in isolated environments
- Low latency — Optimized inference on multi GPU nodes across EU data centers
API Endpoint
https://api.inference.nebul.io/v1
Authenticate with your API key via the Authorization: Bearer <key> header.
Documentation Guide
| Section | What you'll find |
|---|---|
| Quick Start | Make your first LLM request |
| Examples | Code samples for embeddings, vision, function calling, and reasoning |
| Models | Explore available models by capability |
| Advanced Topics | Function calling, structured output, rate limits |
| API Reference | Full OpenAPI specification |
Quick Links
- Get an API key → Nebul AI Studio
- Browse models → Models Overview
- Start coding → Quick Start