Skip to main content

Reasoning Models

Enable chain-of-thought reasoning to improve accuracy on complex tasks like math, logic, and multi-step problems.

Availability

Reasoning is currently available on the following models:

  • Qwen/Qwen3-VL-235B-A22B-Thinking
  • openai/gpt-oss-120b

Both models have reasoning enabled by default in the Chat Completions API.

Check the Models page for the latest reasoning-capable models.

What is Reasoning?

Reasoning models "think out loud" before providing a final answer. This produces two distinct outputs:

  1. reasoning_content: The model's step-by-step thought process
  2. content: The final answer

This approach significantly improves accuracy on tasks that benefit from deliberate analysis, such as mathematical calculations, logic puzzles, and complex planning.

note

Currently, adjusting reasoning effort (e.g., via reasoning_effort) is only part of the Responses API spec and is not supported in our current implementation. Reasoning is always on for these models.

Streaming Reasoning Tokens

When streaming responses, reasoning tokens arrive first in delta.reasoning_content, followed by the final answer in delta.content. You need to handle both fields to display the full "thinking" process to users.

python
12345678910111213141516171819202122232425262728293031
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key-here",
base_url="https://api.inference.nebul.io/v1"
)
stream = client.chat.completions.create(
model="Qwen/Qwen3-VL-235B-A22B-Thinking",
messages=[{"role": "user", "content": "What is 15% of 80?"}],
stream=True,
)
answer_started = False
print("Thinking process:")
for chunk in stream:
delta = chunk.choices[0].delta
# 1. Collect and print reasoning tokens
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(delta.reasoning_content, end="", flush=True)
# 2. Collect and print answer tokens
if delta.content:
# Check if this is the first token of the final answer
if not answer_started:
print("\n\n--- Final Answer ---\n")
answer_started = True
print(delta.content, end="", flush=True)

Response Format (Non-Streaming)

If you are not streaming, the complete reasoning trace and final answer are available in the response object:

json
123456789101112
{
"choices": [
{
"message": {
"role": "assistant",
"reasoning_content": "Let me work through this step by step...",
"content": "The answer is 4."
},
"finish_reason": "stop"
}
]
}

Best Practices

  1. Always Parse Reasoning: Even if you only show the final answer, logging the reasoning_content is valuable for debugging model logic.
  2. Expect Latency: Reasoning models take more time to generate the initial token because they are "thinking" first.
  3. Prompting: Standard prompts work well; you usually don't need to explicitly ask the model to "think step-by-step" as it does so inherently.