Model ID | Type | Reasoning Capability | Context Window | Max Input | Max Output |
deepseek-v4-flash | General Conversation Model | Configurable (Enabled by default) | 1M | 1M | 384K |
deepseek-v4-pro | General Conversation Model | Configurable (Enabled by default) | 1M | 1M | 384K |
deepseek-v3.2 | General Conversation Model | Configurable (Enabled by default) | 128K | 96K | 32K |
thinking parameter.Level | DeepSeek V4-Flash / V4-Pro / V3.2 | OpenAI / Claude / GLM, etc |
Reasoning Capability Switch | Explicitly controlled via the thinking.type parameter | Typically controlled by switching the model or a separate reasoning parameter. |
Reasoning Process Field | Independently returned in the response as reasoning_content | Most models do not expose the reasoning process. |
Access Reasoning Fields via OpenAI SDK | Use hasattr / getattr | - |
temperature | 0 to 2, default 1, freely adjustable | Freely adjustable between 0 and 2 by default. |
Recommended value for max_tokens | 1024 to 4096 for general tasks; recommended ≥ 2048 for thinking mode. | Typically 1024 to 4096 is sufficient. |
Context Window | Up to 1M tokens | Typically 128K tokens |
Maximum Output | Up to 384K tokens | Typically 16K tokens |
Writing back messages in multi-turn conversations | Only pass back the ` content` field; do not pass back `reasoning_content`. | Typically, only contentneeds to be written back. |
YOUR_API_KEY with the API Key you created.curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "Hello, please introduce yourself"}],"max_tokens": 1024}'
Operating System: Ubuntu 24.04.3 LTS / x86_64
Runtime Version: GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)
# pip install openaifrom openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "Hello, please introduce yourself"}],max_tokens=1024,)print(response.choices[0].message.content)
// npm install openaiimport OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "Hello, please introduce yourself" }],max_tokens: 1024,});console.log(response.choices[0].message.content);
// To use OkHttp, add the dependency: implementation("com.squareup.okhttp3:okhttp:4.12.0")import okhttp3.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 1024);JSONArray messages = new JSONArray();JSONObject userMsg = new JSONObject();userMsg.put("role", "user");userMsg.put("content", "Hello, please introduce yourself");messages.put(userMsg);body.put("messages", messages);Request request = new Request.Builder().url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();try (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());System.out.println(result.getJSONArray("choices").getJSONObject(0).getJSONObject("message").getString("content"));}
package mainimport ("bytes""encoding/json""fmt""io""net/http")func main() {body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "Hello, please introduce yourself"},},"max_tokens": 1024,}data, _ := json.Marshal(body)req, _ := http.NewRequest("POST","https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",bytes.NewBuffer(data))req.Header.Set("Authorization", "Bearer YOUR_API_KEY")req.Header.Set("Content-Type", "application/json")resp, _ := http.DefaultClient.Do(req)defer resp.Body.Close()respBody, _ := io.ReadAll(resp.Body)var result map[string]interface{}json.Unmarshal(respBody, &result)choices := result["choices"].([]interface{})msg := choices[0].(map[string]interface{})["message"].(map[string]interface{})fmt.Println(msg["content"])}
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "Introduce large language models"}],"max_tokens": 1024,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "Introduce large language models"}],max_tokens=1024,extra_body={"thinking": {"type": "disabled"}}, # Disable thinking mode to reduce token consumption)print(response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "Introduce large language models" }],max_tokens: 1024,// @ts-ignore - thinking is a DeepSeek extension fieldthinking: { type: "disabled" },});console.log(response.choices[0].message.content);
import okhttp3.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 1024);JSONArray messages = new JSONArray();JSONObject userMsg = new JSONObject();userMsg.put("role", "user");userMsg.put("content", "Introduce large language models");messages.put(userMsg);body.put("messages", messages);JSONObject thinking = new JSONObject();thinking.put("type", "disabled");body.put("thinking", thinking);Request request = new Request.Builder().url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();try (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());System.out.println(result.getJSONArray("choices").getJSONObject(0).getJSONObject("message").getString("content"));}
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "Introduce large language models"},},"max_tokens": 1024,"thinking": map[string]string{"type": "disabled"},}// ... The rest of the request code is the same as in the quick start example.
stream to true to enable SSE streaming output, which is suitable for long-text generation scenarios, effectively prevents timeouts, and improves user experience.curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "Write a short poem about spring"}],"max_tokens": 512,"stream": true,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)stream = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "Write a short poem about spring"}],max_tokens=512,stream=True,extra_body={"thinking": {"type": "disabled"}},)for chunk in stream:if chunk.choices and chunk.choices[0].delta.content:print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const stream = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "Write a short poem about spring" }],max_tokens: 512,stream: true,// @ts-ignorethinking: { type: "disabled" },});for await (const chunk of stream) {const content = chunk.choices[0]?.delta?.content;if (content) process.stdout.write(content);}
// For streaming output, use OkHttp EventSource.import okhttp3.*;import okhttp3.sse.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("stream", true);JSONArray messages = new JSONArray();JSONObject msg = new JSONObject();msg.put("role", "user");msg.put("content", "Write a short poem about spring");messages.put(msg);body.put("messages", messages);body.put("thinking", new JSONObject().put("type", "disabled"));Request request = new Request.Builder().url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();EventSources.createFactory(httpClient).newEventSource(request, new EventSourceListener() {@Overridepublic void onEvent(EventSource source, String id, String type, String data) {if ("[DONE]".equals(data)) return;try {JSONObject json = new JSONObject(data);String content = json.getJSONArray("choices").getJSONObject(0).getJSONObject("delta").optString("content", "");if (!content.isEmpty()) System.out.print(content);} catch (JSONException ignored) {}}});
import ("bufio""bytes""encoding/json""fmt""net/http""strings")body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "Write a short poem about spring"}},"max_tokens": 512,"stream": true,"thinking": map[string]string{"type": "disabled"},}data, _ := json.Marshal(body)req, _ := http.NewRequest("POST","https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",bytes.NewBuffer(data))req.Header.Set("Authorization", "Bearer YOUR_API_KEY")req.Header.Set("Content-Type", "application/json")resp, _ := http.DefaultClient.Do(req)defer resp.Body.Close()scanner := bufio.NewScanner(resp.Body)for scanner.Scan() {line := scanner.Text()if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {continue}var chunk map[string]interface{}json.Unmarshal([]byte(strings.TrimPrefix(line, "data: ")), &chunk)choices := chunk["choices"].([]interface{})delta := choices[0].(map[string]interface{})["delta"].(map[string]interface{})if content, ok := delta["content"].(string); ok {fmt.Print(content)}}
system role messages to set the model's behavioral instructions and background information.curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},{"role": "user", "content": "How to read a CSV file"}],"max_tokens": 512,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "system","content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",},{"role": "user", "content": "How to read a CSV file"},],max_tokens=512,extra_body={"thinking": {"type": "disabled"}},)print(response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{role: "system","content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",},{ role: "user", content: "How to read a CSV file" },],max_tokens: 512,// @ts-ignorethinking: { type: "disabled" },});console.log(response.choices[0].message.content);
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("thinking", new JSONObject().put("type", "disabled"));JSONArray messages = new JSONArray();messages.put(new JSONObject().put("role", "system").put("content", "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."));messages.put(new JSONObject().put("role", "user").put("content", "How to read a CSV file"));body.put("messages", messages);// ... The request sending code is the same as above.
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},{"role": "user", "content": "How to read a CSV file"},},"max_tokens": 512,"thinking": map[string]string{"type": "disabled"},}// ... The request sending code is the same as in the quick start.
messages array to enable multi-turn dialogue with context memory.curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},{"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},{"role": "user", "content": "Do you remember my name and hobbies?"}],"max_tokens": 256,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)# Maintain Conversation Historyconversation = [{"role": "system", "content": "You are a friendly AI assistant."},]def chat(user_input):conversation.append({"role": "user", "content": user_input})response = client.chat.completions.create(model="deepseek-v4-flash",messages=conversation,max_tokens=1024,extra_body={"thinking": {"type": "disabled"}},)reply = response.choices[0].message.contentconversation.append({"role": "assistant", "content": reply})return replyprint(chat("My name is Xiaoming, and I like playing basketball."))print(chat("Do you remember my name and hobbies?"))
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const conversation = [{"role": "system", "content": "You are a friendly AI assistant."},];async function chat(userInput) {conversation.push({ role: "user", content: userInput });const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: conversation,max_tokens: 1024,// @ts-ignorethinking: { type: "disabled" },});const reply = response.choices[0].message.content;conversation.push({ role: "assistant", content: reply });return reply;}console.log(await chat("My name is Xiaoming, and I like playing basketball."));console.log(await chat("Do you remember my name and hobbies?"));
// Core of multi-turn dialogue: Pass the messages array cumulatively.JSONArray messages = new JSONArray();messages.put(new JSONObject().put("role", "system").put("content", "You are a friendly AI assistant."));messages.put(new JSONObject().put("role", "user").put("content", "My name is Xiaoming, and I like playing basketball."));messages.put(new JSONObject().put("role", "assistant").put("content", "Hello, Xiaoming! Playing basketball is a great sport."));messages.put(new JSONObject().put("role", "user").put("content", "Do you remember my name and hobbies?"));JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("messages", messages);body.put("max_tokens", 1024);body.put("thinking", new JSONObject().put("type", "disabled"));// ... The request sending code is the same as above.
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "system", "content": "You are a friendly AI assistant."},{"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},{"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},{"role": "user", "content": "Do you remember my name and hobbies?"},},"max_tokens": 1024,"thinking": map[string]string{"type": "disabled"},}// ... The request sending code is the same as in the quick start.
tool_calls (which contain the function name and parameters).role: tool message.# Round 1: Send the Question + Tool Definitionscurl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "What is the weather like in Beijing today?"}],"tools": [{"type": "function","function": {"name": "get_weather","description": "Obtain weather information for a specified city","parameters": {"type": "object","properties": {"city": {"type": "string", "description": "City name, such as Beijing"}},"required": ["city"]}}}],"thinking": {"type": "disabled"}}'# Round 2: Return the Tool Execution Result (Replace tool_call_id with the actual returned id)curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "What is the weather like in Beijing today?"}{"role": "assistant", "tool_calls": [{"id": "call_xxx", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"Beijing\\"}"}}]},{"role": "tool", "tool_call_id": "call_xxx", "content": "Sunny, temperature 28°C, humidity 50%"}],"tools": [{"type": "function", "function": {"name": "get_weather", "description": "Obtain weather information for a specified city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)# Define Toolstools = [{"type": "function","function": {"name": "get_weather","description": "Obtain weather information for a specified city","parameters": {"type": "object","properties": {"city": {"type": "string", "description": "City name, such as Beijing"}},"required": ["city"],},},}]# Round 1: Send the Questionmessages = [{"role": "user", "content": "What is the weather like in Beijing today?"}]response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,tools=tools,extra_body={"thinking": {"type": "disabled"}},)assistant_message = response.choices[0].message# The Model Initiates a Tool Callif response.choices[0].finish_reason == "tool_calls":tool_call = assistant_message.tool_calls[0]print(f"Model calls tool: {tool_call.function.name}, parameters: {tool_call.function.arguments}")# Execute the Tool (This is a simulated return)tool_result = "Sunny, temperature 28°C, humidity 50%"# Round 2: Return the Tool Result to the Modelmessages.append(assistant_message)messages.append({"role": "tool","tool_call_id": tool_call.id,"content": tool_result,})final_response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,tools=tools,extra_body={"thinking": {"type": "disabled"}},)print(final_response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const tools = [{type: "function",function: {name: "get_weather","description": "Obtain weather information for a specified city",parameters: {type: "object",properties: {"city": {"type": "string", "description": "City name, such as Beijing"}},required: ["city"],},},},];// Round 1const messages = [{ role: "user", content: "What is the weather like in Beijing today?" }];const response1 = await client.chat.completions.create({model: "deepseek-v4-flash",messages,tools,// @ts-ignorethinking: { type: "disabled" },});const assistantMsg = response1.choices[0].message;if (response1.choices[0].finish_reason === "tool_calls") {const toolCall = assistantMsg.tool_calls[0];console.log(`Tool call: ${toolCall.function.name}, parameters: ${toolCall.function.arguments}`);const toolResult = "Sunny, temperature 28°C, humidity 50%";messages.push(assistantMsg);messages.push({ role: "tool", tool_call_id: toolCall.id, content: toolResult });const response2 = await client.chat.completions.create({model: "deepseek-v4-flash",messages,tools,// @ts-ignorethinking: { type: "disabled" },});console.log(response2.choices[0].message.content);}
JSONObject toolFunc = new JSONObject().put("name", "get_weather").put("description", "Obtain weather information for a specified city").put("parameters", new JSONObject().put("type", "object").put("properties", new JSONObject().put("city", new JSONObject().put("type", "string").put("description", "City name"))).put("required", new JSONArray().put("city")));JSONArray tools = new JSONArray().put(new JSONObject().put("type", "function").put("function", toolFunc));JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("messages", new JSONArray().put(new JSONObject().put("role", "user").put("content", "What is the weather like in Beijing today?")));body.put("tools", tools);body.put("thinking", new JSONObject().put("type", "disabled"));// ... Send the request, parse tool_calls, execute the tool, and construct the second-round request
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "What is the weather like in Beijing today?"}},"tools": []map[string]interface{}{{"type": "function","function": map[string]interface{}{"name": "get_weather","description": "Obtain weather information for a specified city","parameters": map[string]interface{}{"type": "object","properties": map[string]interface{}{"city": map[string]string{"type": "string", "description": "City name"},},"required": []string{"city"},},},}},"thinking": map[string]string{"type": "disabled"},}// ... Send the request, parse tool_calls, and construct the second-round request
thinking parameter, eliminating the need to switch model IDs. After the reasoning mode is enabled, the model performs internal reasoning before providing the final answer, making it suitable for complex tasks that require precise reasoning.Field | Type | Default Value | Value Range | Description |
type | string | "enabled" | "enabled" / "disabled" | Controls the thinking mode switch. |
reasoning_effort | string | "high" | "high" / "max" | Reasoning depth. max is suitable for complex Agent scenarios; low/medium maps to high, and xhigh maps to max. |
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}],"max_tokens": 2048,"thinking": {"type": "enabled", "reasoning_effort": "high"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}],max_tokens=2048,extra_body={"thinking": {"type": "enabled", "reasoning_effort": "high"}},)msg = response.choices[0].message# Obtain the reasoning process (a field exclusive to thinking mode)reasoning = getattr(msg, "reasoning_content", None)if reasoning:print("=== Reasoning Process ===")print(reasoning)print("=== Final Answer ===")print(msg.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "Solve the equation x^2 - 5x + 6 = 0" }],max_tokens: 2048,// @ts-ignorethinking: { type: "enabled", reasoning_effort: "high" },});const msg = response.choices[0].message;const reasoning = (msg as any).reasoning_content;if (reasoning) {console.log("=== Reasoning Process ===");console.log(reasoning);}console.log("=== Final Answer ===");console.log(msg.content);
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 2048);body.put("messages", new JSONArray().put(new JSONObject().put("role", "user").put("content", "Solve the equation x^2 - 5x + 6 = 0")));body.put("thinking", new JSONObject().put("type", "enabled").put("reasoning_effort", "high"));// ... Send the requesttry (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());JSONObject message = result.getJSONArray("choices").getJSONObject(0).getJSONObject("message");String reasoning = message.optString("reasoning_content", "");String content = message.getString("content");System.out.println("Reasoning Process: " + reasoning);System.out.println("Final Answer: " + content);}
body := map[string]interface{}{"model": "deepseek-v4-flash","max_tokens": 2048,"messages": []map[string]string{{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"},},"thinking": map[string]string{"type": "enabled", "reasoning_effort": "high"},}// ... Send the request and parse the reasoning_content and content fields from the response.
reasoning_content field is included in the response's message:{"choices": [{"message": {"role": "assistant","reasoning_content": "I need to solve the quadratic equation x^2 - 5x + 6 = 0.\\nFactorization: (x-2)(x-3) = 0\\nTherefore, x = 2 or x = 3.","content": "The solution to the equation x² - 5x + 6 = 0 is: **x = 2** or **x = 3**"},"finish_reason": "stop"}],"usage": {"completion_tokens": 120,"completion_tokens_details": {"reasoning_tokens": 80}}}
reasoning_content and content are both returned in incremental delta format and must be processed separately:from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)stream = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "Analyze the advantages and challenges of quantum computing."}],max_tokens=2048,stream=True,extra_body={"thinking": {"type": "enabled"}},)print("=== Reasoning Process (Real-time) ===")answer_started = Falsefor chunk in stream:if not chunk.choices:continuedelta = chunk.choices[0].deltareasoning_delta = getattr(delta, "reasoning_content", None)if reasoning_delta:print(reasoning_delta, end="", flush=True)if delta.content:if not answer_started:print("\\n\\n=== Final Answer (Real-time) ===")answer_started = Trueprint(delta.content, end="", flush=True)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const stream = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "Analyze the advantages and challenges of quantum computing." }],max_tokens: 2048,stream: true,// @ts-ignorethinking: { type: "enabled" },});let answerStarted = false;process.stdout.write("=== Reasoning Process (Real-time) ===\\n");for await (const chunk of stream) {const delta = chunk.choices[0]?.delta;if (!delta) continue;const reasoning = (delta as any).reasoning_content;if (reasoning) process.stdout.write(reasoning);if (delta.content) {if (!answerStarted) {process.stdout.write("\\n\\n=== Final Answer (Real-time) ===\\n");answerStarted = true;}process.stdout.write(delta.content);}}
reasoning_content from the previous round back to the model; only pass the content field.from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)messages = [{"role": "user", "content": "What is the 10th term of the Fibonacci sequence?"}]response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,max_tokens=1024,extra_body={"thinking": {"type": "enabled"}},)assistant_msg = response.choices[0].messageprint("First Round Answer:", assistant_msg.content)# Multi-turn Conversations: Only pass back the content, not the reasoning_content.messages.append({"role": "assistant", "content": assistant_msg.content})messages.append({"role": "user", "content": "What about the 20th item?"})response2 = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,max_tokens=1024,extra_body={"thinking": {"type": "enabled"}},)print("Second Round Answer:", response2.choices[0].message.content)
assistant message in a multi-turn conversation, only pass the content field; do not pass the reasoning_content field.response_format to json_object ensures that the model outputs valid JSON strings, which is suitable for scenarios requiring structured data.system or user message; otherwise, the model may continuously output empty content.curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "system", "content": "Return the result in JSON format."},{"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."}],"max_tokens": 512,"response_format": {"type": "json_object"},"thinking": {"type": "disabled"}}'
import jsonfrom openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "system", "content": "Return the result in JSON format."},{"role": "user","content": "Return information for three Chinese cities, each containing the name, province, and population fields.",},],max_tokens=512,response_format={"type": "json_object"},extra_body={"thinking": {"type": "disabled"}},)result = json.loads(response.choices[0].message.content)print(json.dumps(result, ensure_ascii=False, indent=2))
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "system", content: "Return the result in JSON format." },{role: "user","content": "Return information for three Chinese cities, each containing the name, province, and population fields.",},],max_tokens: 512,response_format: { type: "json_object" },// @ts-ignorethinking: { type: "disabled" },});const result = JSON.parse(response.choices[0].message.content);console.log(JSON.stringify(result, null, 2));
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("response_format", new JSONObject().put("type", "json_object"));body.put("thinking", new JSONObject().put("type", "disabled"));body.put("messages", new JSONArray().put(new JSONObject().put("role", "system").put("content", "Return the result in JSON format.")).put(new JSONObject().put("role", "user").put("content","Return information for three Chinese cities, each containing the name, province, and population fields.")));// ... Send the request and parse the returned JSON string.
body := map[string]interface{}{"model": "deepseek-v4-flash","max_tokens": 512,"response_format": map[string]string{"type": "json_object"},"thinking": map[string]string{"type": "disabled"},"messages": []map[string]string{{"role": "system", "content": "Return the result in JSON format."},{"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."},},}// ... Send the request
Parameter / Practice | Recommendation | Description |
max_tokens | 1024 to 4096 for general tasks; recommended ≥ 2048 for thinking mode | Reasoning content and the answer share the token quota. |
thinking | Use disabled for simple Q&A; use enabled for logical reasoning and math problems. | Proper use can reduce costs. |
stream | Enable it for long text generation. | Avoid request timeouts and improve the response experience. |
temperature | Generally, no modification is required. Use the default value of 1. | For creative writing, increase to 1.3-1.5; for code generation, decrease to 0.2-0.5. |
Multi-turn Conversation | Only return content; do not return reasoning_content. | Reduce token consumption. |
Access Reasoning Fields via SDK | Use getattr(msg, "reasoning_content", None) for Python; use (msg as any).reasoning_content for Node.js. | This field is not defined in the OpenAI SDK type definitions. |
Model Selection | Use deepseek-v4-flash for daily tasks; use deepseek-v4-pro for high-precision tasks. | Flash has a higher concurrency limit (2500 vs 500). |
Restriction Item | Description |
Thinking Mode and JSON Mode | It is not recommended to simultaneously enable thinking.type=enabled and response_format.type=json_object. |
frequency_penalty / presence_penalty | Deprecated. Passing it has no effect. |
Timeout risk | When thinking mode is enabled, the response time is longer. Use it with stream=true to avoid timeouts. |
Apakah halaman ini membantu?
Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.
masukan