Reasoning Models
Enable chain-of-thought reasoning to improve accuracy on complex tasks like math, logic, and multi-step problems.
Availability
Reasoning is currently available on the following models:
Qwen/Qwen3-VL-235B-A22B-Thinkingopenai/gpt-oss-120b
Both models have reasoning enabled by default in the Chat Completions API.
Check the Models page for the latest reasoning-capable models.
What is Reasoning?
Reasoning models "think out loud" before providing a final answer. This produces two distinct outputs:
reasoning_content: The model's step-by-step thought processcontent: The final answer
This approach significantly improves accuracy on tasks that benefit from deliberate analysis, such as mathematical calculations, logic puzzles, and complex planning.
Currently, adjusting reasoning effort (e.g., via reasoning_effort) is only part of the Responses API spec and is not supported in our current implementation. Reasoning is always on for these models.
Streaming Reasoning Tokens
When streaming responses, reasoning tokens arrive first in delta.reasoning_content, followed by the final answer in delta.content. You need to handle both fields to display the full "thinking" process to users.
- Python
- cURL
from openai import OpenAIclient = OpenAI(api_key="sk-your-api-key-here",base_url="https://api.inference.nebul.io/v1")stream = client.chat.completions.create(model="Qwen/Qwen3-VL-235B-A22B-Thinking",messages=[{"role": "user", "content": "What is 15% of 80?"}],stream=True,)answer_started = Falseprint("Thinking process:")for chunk in stream:delta = chunk.choices[0].delta# 1. Collect and print reasoning tokensif hasattr(delta, "reasoning_content") and delta.reasoning_content:print(delta.reasoning_content, end="", flush=True)# 2. Collect and print answer tokensif delta.content:# Check if this is the first token of the final answerif not answer_started:print("\n\n--- Final Answer ---\n")answer_started = Trueprint(delta.content, end="", flush=True)
curl https://api.inference.nebul.io/v1/chat/completions \-H "Content-Type: application/json" \-H "Authorization: Bearer sk-your-api-key-here" \-d '{"model": "openai/gpt-oss-120b","messages": [{"role": "user", "content": "Solve: If x + 3 = 7, what is x?"}],"stream": true}'
Response Format (Non-Streaming)
If you are not streaming, the complete reasoning trace and final answer are available in the response object:
{"choices": [{"message": {"role": "assistant","reasoning_content": "Let me work through this step by step...","content": "The answer is 4."},"finish_reason": "stop"}]}
Best Practices
- Always Parse Reasoning: Even if you only show the final answer, logging the
reasoning_contentis valuable for debugging model logic. - Expect Latency: Reasoning models take more time to generate the initial token because they are "thinking" first.
- Prompting: Standard prompts work well; you usually don't need to explicitly ask the model to "think step-by-step" as it does so inherently.