Reasoning Models

Enable chain-of-thought reasoning to improve accuracy on complex tasks like math, logic, and multi-step problems.

Availability

Reasoning is currently available on the following models:

Qwen/Qwen3-VL-235B-A22B-Thinking
openai/gpt-oss-120b

Both models have reasoning enabled by default in the Chat Completions API.

Check the Models page for the latest reasoning-capable models.

What is Reasoning?

Reasoning models "think out loud" before providing a final answer. This produces two distinct outputs:

reasoning_content: The model's step-by-step thought process
content: The final answer

This approach significantly improves accuracy on tasks that benefit from deliberate analysis, such as mathematical calculations, logic puzzles, and complex planning.

note

Currently, adjusting reasoning effort (e.g., via reasoning_effort) is only part of the Responses API spec and is not supported in our current implementation. Reasoning is always on for these models.

Streaming Reasoning Tokens

When streaming responses, reasoning tokens arrive first in delta.reasoning_content, followed by the final answer in delta.content. You need to handle both fields to display the full "thinking" process to users.

Python
cURL

python
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key-here",
    base_url="https://api.inference.nebul.io/v1"
)

stream = client.chat.completions.create(
    model="Qwen/Qwen3-VL-235B-A22B-Thinking",
    messages=[{"role": "user", "content": "What is 15% of 80?"}],
    stream=True,
)

answer_started = False

print("Thinking process:")
for chunk in stream:
    delta = chunk.choices[0].delta

    # 1. Collect and print reasoning tokens
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        print(delta.reasoning_content, end="", flush=True)

    # 2. Collect and print answer tokens
    if delta.content:
        # Check if this is the first token of the final answer
        if not answer_started:
            print("\n\n--- Final Answer ---\n")
            answer_started = True

        print(delta.content, end="", flush=True)

bash
curl https://api.inference.nebul.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key-here" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "Solve: If x + 3 = 7, what is x?"}],
    "stream": true
  }'

Response Format (Non-Streaming)

If you are not streaming, the complete reasoning trace and final answer are available in the response object:

json
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "reasoning_content": "Let me work through this step by step...",
        "content": "The answer is 4."
      },
      "finish_reason": "stop"
    }
  ]
}

Best Practices

Always Parse Reasoning: Even if you only show the final answer, logging the reasoning_content is valuable for debugging model logic.
Expect Latency: Reasoning models take more time to generate the initial token because they are "thinking" first.
Prompting: Standard prompts work well; you usually don't need to explicitly ask the model to "think step-by-step" as it does so inherently.

Availability​

What is Reasoning?​

Streaming Reasoning Tokens​

Response Format (Non-Streaming)​

Best Practices​

Availability

What is Reasoning?

Streaming Reasoning Tokens

Response Format (Non-Streaming)

Best Practices