Introduction

The Nebul Inference API provides secure, low-latency access to open-weight AI models through an OpenAI-compatible REST API. Run inference on models like Mistral, Qwen3, and more-without managing GPU infrastructure.

What is the Inference API?

A managed inference service that hosts state-of-the-art open-weight models on dedicated GPU clusters. You get:

OpenAI SDK compatibility — Drop-in replacement for OpenAI client libraries
Open-weight models — Access to Llama, Qwen, DeepSeek, Mistral, and more
Multi-modal support — Text, vision, embeddings, speech, and image generation
Privacy-first — Your data never trains models; requests are processed in isolated environments
Low latency — Optimized inference on multi GPU nodes across EU data centers

API Endpoint

https://api.inference.nebul.io/v1

Authenticate with your API key via the Authorization: Bearer <key> header.

Documentation Guide

Section	What you'll find
Quick Start	Make your first LLM request
Examples	Code samples for embeddings, vision, function calling, and reasoning
Models	Explore available models by capability
Advanced Topics	Function calling, structured output, rate limits
API Reference	Full OpenAPI specification

Quick Links

Get an API key → Nebul AI Studio
Browse models → Models Overview
Start coding → Quick Start

What is the Inference API?​

API Endpoint​

Documentation Guide​

Quick Links​

What is the Inference API?

API Endpoint

Documentation Guide

Quick Links