📘 Getting Started with the Private Inference API
Overview
Private Inference API is a secure inferencing service run on Nebul's private NeoCloud, ensuring compliance and data protection. It offers open-source and fine-tuned AI models, ideal for industries handling sensitive information, with seamless integration and transparent pricing.
✅ Prerequisites
- API Key for authentication.
- Familiarity with OpenAI-compatible APIs (e.g., GPT-4, ChatCompletion, Completion endpoints).
curl
, Postman, or any HTTP client (e.g., Pythonrequests
, OpenAI SDK).
Note: API keys begin with
sk-
(e.g.,sk-dummy
).
🔑 Authentication
Authenticate requests using a Bearer token:
Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY
Note: API keys begin with
sk-
(e.g.,sk-dummy
).
🔌 Base URL
https://api.chat.nebul.io/v1
This base URL mimics the OpenAI format. All endpoints align with OpenAI's structure to ensure minimal friction for integration.
🚀 Quickstart Example (Chat Completion)
Request
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/gemma-3-27b-it",
"messages": [{"role": "user", "content": "What is the capital of the Netherlands?"}],
"temperature": 0.7
}'
Response
{
"id": "chatcmpl-abc123xyz",
"object": "chat.completion",
"created": 1685580297,
"model": "demo/gemma-3-27b-it",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of the Netherlands is Amsterdam. However, The Hague is the seat of government and home to the Dutch parliament."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 25,
"total_tokens": 37
}
}
📦 Supported Endpoints
Endpoint | Method | Description |
---|---|---|
/v1/chat/completions | POST | Chat-based model responses |
/v1/completions | POST | Classic completion model endpoint |
/v1/embeddings | POST | Generate vector embeddings |
/v1/models | GET | List available models |
📘 Using Python (via OpenAI SDK)
Example: Listing Available Models
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY")
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
models = client.models.list()
for model in models.data:
print(model.id)
Equivalent bash:
curl -X GET "https://api.chat.nebul.io/v1/models" \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY"
Example: Sending an Image for Analysis
import base64
import os
from openai import OpenAI
def encode_image_to_base64(image_path):
"""Read an image file and return a base64-encoded string."""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY") # Use environment variable for API key
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
image_path = "your_path_to_image.jpg"
base64_image = encode_image_to_base64(image_path)
if base64_image:
stream = client.chat.completions.create(
model="demo/gemma-3-27b-it",
messages=[
{"role": "user", "content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}},
]}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
Equivalent bash:
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/gemma-3-27b-it",
"messages": [
{"role": "user", "content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "_YOUR_BASE64_IMAGE_HERE"}}
]}
],
"stream": true
}'
Notes
- Model selection: Make sure the model you use supports vision/image input. Use
/v1/models
to list available models. - Image format: This example assumes JPEG. For PNG, change the MIME type to
image/png
. - Base64 encoding: The image is encoded and sent as a data URL in the message content.
- File path: Replace
"your_path_to_image.jpg"
with the actual path to your image file on your local system. - Error handling: The script includes error handling for missing files and API errors.
🔄 Streaming & Non-Streaming Usage
Minimal Python Example (Non-Streaming)
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY") # Use environment variable for API key
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
response = client.chat.completions.create(
model="demo/gemma-3-27b-it",
messages=[{"role": "user", "content": "What is the capital of France?"}],
stream=False,
)
print(response.choices[0].message.content.strip())
Equivalent bash:
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/gemma-3-27b-it",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"stream": false
}'
Minimal Python Example (Streaming)
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY") # Use environment variable for API key
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
stream = client.chat.completions.create(
model="demo/gemma-3-27b-it",
messages=[{"role": "user", "content": "Write a short sentence."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
Equivalent bash:
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/gemma-3-27b-it",
"messages": [{"role": "user", "content": "Write a short sentence."}],
"stream": true
}'
Notes:
- Replace the API key with your own.
- The streaming curl response will be in Server-Sent Events (SSE) format.
- Only use models listed in
/v1/models
for best compatibility.
Error Response Format
API errors follow LiteLLM format:
{
"error": {
"message": "Authentication Error, LiteLLM Virtual Key expected. Received=INVALID_API_KEY, expected to start with 'sk-'.",
"type": "auth_error",
"param": "None",
"code": "401"
}
}
🛠️ Qwen-3 Tool Calling
Qwen-3 models on Nebul already run with automatic function calling.
All you do is add a "tools"
array to your normal request; the endpoint takes care of the rest.
🔑 Same Auth & URL as the other endpoints
Item | Value |
---|---|
Base URL | https://api.chat.nebul.io/v1 |
Header | Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY |
🚀 Step-by-Step (curl)
- Send your prompt + tool spec
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/qwen-3-14b",
"messages": [
{ "role": "user",
"content": "Remind me to submit the report at 17:00." }
],
"tools": [{
"type": "function",
"function": {
"name": "set_reminder",
"description": "Create a reminder",
"parameters": {
"type": "object",
"properties": {
"text": { "type": "string" },
"time": { "type": "string" }
},
"required": ["text", "time"]
}
}
}]
}'
- Parse the first reply
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_0",
"function": {
"name": "set_reminder",
"arguments": "{ \"text\": \"Submit the report\", \"time\": \"17:00\" }"
}
}]
},
"finish_reason": "tool_calls"
}]
}
- Execute the function in your app and return the result:
curl https://api.chat.nebul.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/qwen-3-14b",
"messages": [
{ "role": "user", "content": "Remind me to submit the report at 17:00." },
{ "role": "assistant", "content": null,
"tool_calls": [{
"id": "call_0",
"function": {
"name": "set_reminder",
"arguments": "{ \"text\": \"Submit the report\", \"time\": \"17:00\" }"
}
}]
},
{ "role": "tool",
"tool_call_id": "call_0",
"name": "set_reminder",
"content": "{ \"status\": \"Reminder set for 17:00.\" }"
}
]
}'
- Get the final, human-readable answer
{
"choices": [{
"message": {
"role": "assistant",
"content": "Got it! I’ve scheduled the reminder for 17:00. Anything else?"
}
}]
}
🐍 End-to-End in Python
import json
from openai import OpenAI
client = OpenAI(
api_key="YOUR_PRIVATE_INFERENCE_API_KEY",
base_url="https://api.chat.nebul.io/v1"
)
tools = [{
"type": "function",
"function": {
"name": "set_reminder",
"description": "Create a reminder",
"parameters": {
"type": "object",
"properties": {
"text": {"type": "string"},
"time": {"type": "string"}
},
"required": ["text", "time"]
}
}
}]
msgs = [{"role": "user",
"content": "Remind me to call Alice at 15:00."}]
# 1️ ask the model
r1 = client.chat.completions.create(
model="demo/qwen-3-14b",
messages=msgs,
tools=tools
)
call = r1.choices[0].message.tool_calls[0]
msgs.append(r1.choices[0].message.model_dump())
# 2️ run the function yourself
args = json.loads(call.function.arguments)
result = {"status": f"Reminder set for {args['time']}."}
# 3️ send back the result, get final reply
msgs.append({
"role": "tool",
"tool_call_id": call.id,
"name": call.function.name,
"content": json.dumps(result)
})
r2 = client.chat.completions.create(
model="demo/qwen-3-14b",
messages=msgs
)
print(r2.choices[0].message.content)
⚙️ Quick Troubleshooting
Symptom | Check |
---|---|
No tool_calls returned | Ensure you passed the "tools" array. |
Arguments not JSON | Validate your JSON Schema (type , required , spelling). |
Multiple calls | Send one role:"tool" message per call before the final request. |
Need streaming | Add "stream": true to either request. |
🔗 Embeddings
The embeddings endpoint converts text into vector representations for similarity search, clustering, and other machine learning tasks.
Available Models
Use the /v1/models
endpoint to find embedding models. Currently available:
demo/multilingual-e5-large-instruct
- Multilingual embedding model supporting many languages
Example: Generate Embeddings
Python:
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY")
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
response = client.embeddings.create(
model="demo/multilingual-e5-large-instruct",
input="Machine learning is transforming the world"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
print(f"Tokens used: {response.usage.total_tokens}")
Equivalent bash:
curl https://api.chat.nebul.io/v1/embeddings \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/multilingual-e5-large-instruct",
"input": "Machine learning is transforming the world"
}'
Response Format
{
"model": "intfloat/multilingual-e5-large-instruct",
"data": [
{
"embedding": [0.0057, 0.0143, -0.0203, ...],
"index": 0,
"object": "embedding"
}
],
"object": "list",
"usage": {
"completion_tokens": 0,
"prompt_tokens": 42,
"total_tokens": 42
}
}
Multiple Inputs
You can process multiple texts in a single request:
Python:
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY")
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
texts = [
"Happy customer review",
"Error: Connection failed",
"Neutral product description"
]
response = client.embeddings.create(
model="demo/multilingual-e5-large-instruct",
input=texts
)
for i, data in enumerate(response.data):
print(f"Text {i+1}: {len(data.embedding)} dimensions")
Equivalent bash:
curl https://api.chat.nebul.io/v1/embeddings \
-H "Authorization: Bearer YOUR_PRIVATE_INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "demo/multilingual-e5-large-instruct",
"input": ["Happy customer review", "Error: Connection failed", "Neutral product description"]
}'
Error Handling
Invalid Model Error:
{
"error": {
"message": "Team not allowed to access model. Team=f697846b-bc9b-4fc0-8e0e-3c2ee0162289, Model=invalid-model. Allowed team models = ['demo/gemma-3-27b-it', 'demo/multilingual-e5-large-instruct']",
"type": "team_model_access_denied",
"param": "model",
"code": "401"
}
}
Authentication Error:
{
"error": {
"message": "Authentication Error, Invalid proxy server token passed. valid_token=None.",
"type": "auth_error",
"param": "None",
"code": "401"
}
}
Python Error Handling Example:
import os
from openai import OpenAI
API_KEY = os.environ.get("YOUR_PRIVATE_INFERENCE_API_KEY")
client = OpenAI(api_key=API_KEY, base_url="https://api.chat.nebul.io/v1")
try:
response = client.embeddings.create(
model="demo/multilingual-e5-large-instruct",
input="Your text here"
)
embedding = response.data[0].embedding
print(f"Success: {len(embedding)} dimensions")
except Exception as e:
print(f"Error: {e}")
---
## 📩 Support
For support, reach out to your enterprise contact or email: `engineering@nebul.com`.
---