MiniMax API Guide

Download
Modo Foco
Tamanho da Fonte
Última atualização: 2026-06-11 18:02:54
Overview
The MiniMax series of models have been integrated into the large model service platform TokenHub and support the OpenAI Chat Completions protocol. Developers can quickly integrate them without changing their SDK. This document introduces general invocation examples and core capabilities unique to MiniMax, such as its reasoning mode and Function Calling.
Supported Models
TokenHub currently supports the following MiniMax models (for specifics, refer to the model list):
Model ID
Type
Reasoning Capability
Context Window
Max Input
Max Output
minimax-m3
General Conversation Model
Supported
1M
1M
-
minimax-m2.7
General Conversation Model
Supported
200K
200K
128K
minimax-m2.5
General Conversation Model
Supported
200K
200K
128K
Prerequisites
You have registered a Tencent Cloud account and activated the TokenHub service.
You have obtained the API Key in the TokenHub console.
You have installed the corresponding SDK for the programming language you are using, or you possess the capability to make HTTP requests.
Quick Start
The following example demonstrates the simplest single-turn conversation call. Please replace YOUR_API_KEY with the API Key you created.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Hello, please introduce yourself"}
    ],
    "max_tokens": 1024
  }'
# pip install openai
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[
        {"role": "user", "content": "Hello, please introduce yourself"}
    ],
    max_tokens=1024,
)
print(response.choices[0].message.content)
// npm install openai
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const response = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [
    { role: "user", content: "Hello, please introduce yourself" }
  ],
  max_tokens: 1024,
});
console.log(response.choices[0].message.content);
// To use OkHttp, add the dependency: implementation("com.squareup.okhttp3:okhttp:4.12.0")
import okhttp3.*;
import org.json.*;
﻿
OkHttpClient httpClient = new OkHttpClient();
﻿
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 1024);
JSONArray messages = new JSONArray();
JSONObject userMsg = new JSONObject();
userMsg.put("role", "user");
userMsg.put("content", "Hello, please introduce yourself");
messages.put(userMsg);
body.put("messages", messages);
﻿
Request request = new Request.Builder()
    .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
    .addHeader("Authorization", "Bearer YOUR_API_KEY")
    .addHeader("Content-Type", "application/json")
    .post(RequestBody.create(body.toString(), MediaType.get("application/json")))
    .build();
﻿
try (Response response = httpClient.newCall(request).execute()) {
    JSONObject result = new JSONObject(response.body().string());
    System.out.println(result.getJSONArray("choices")
        .getJSONObject(0).getJSONObject("message").getString("content"));
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body := map[string]interface{}{
        "model": "minimax-m2.7",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello, please introduce yourself"},
        },
        "max_tokens": 1024,
    }
    data, _ := json.Marshal(body)
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(data))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
    respBody, _ := io.ReadAll(resp.Body)
﻿
    var result map[string]interface{}
    json.Unmarshal(respBody, &result)
    choices := result["choices"].([]interface{})
    msg := choices[0].(map[string]interface{})["message"].(map[string]interface{})
    fmt.Println(msg["content"])
}
General Calling Examples
Basic Conversation
Send a single-turn conversation request to obtain the model's response.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Introduce large language models"}
    ],
    "max_tokens": 1024
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[
        {"role": "user", "content": "Introduce large language models"}
    ],
    max_tokens=1024,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const response = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [
    { role: "user", content: "Introduce large language models" }
  ],
  max_tokens: 1024,
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import org.json.*;
﻿
OkHttpClient httpClient = new OkHttpClient();
﻿
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 1024);
﻿
JSONArray messages = new JSONArray();
messages.put(new JSONObject().put("role", "user").put("content", "Introduce large language models"));
body.put("messages", messages);
﻿
Request request = new Request.Builder()
    .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
    .addHeader("Authorization", "Bearer YOUR_API_KEY")
    .addHeader("Content-Type", "application/json")
    .post(RequestBody.create(body.toString(), MediaType.get("application/json")))
    .build();
﻿
try (Response response = httpClient.newCall(request).execute()) {
    JSONObject result = new JSONObject(response.body().string());
    System.out.println(result.getJSONArray("choices")
        .getJSONObject(0).getJSONObject("message").getString("content"));
}
body := map[string]interface{}{
    "model": "minimax-m2.7",
    "messages": []map[string]string{
        {"role": "user", "content": "Introduce large language models"},
    },
    "max_tokens": 1024,
}
// ... The rest of the request code is the same as in the quick start example.
Streaming Output
Set stream to true to enable SSE streaming output, which is suitable for long-text generation scenarios, effectively prevents timeouts, and improves user experience.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Write a short poem about spring"}
    ],
    "max_tokens": 512,
    "stream": true
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
stream = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[
        {"role": "user", "content": "Write a short poem about spring"}
    ],
    max_tokens=512,
    stream=True,
)
﻿
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const stream = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [
    { role: "user", content: "Write a short poem about spring" }
  ],
  max_tokens: 512,
  stream: true,
});
﻿
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}
import okhttp3.*;
import okhttp3.sse.*;
import org.json.*;
﻿
OkHttpClient httpClient = new OkHttpClient();
﻿
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 512);
body.put("stream", true);
body.put("messages", new JSONArray()
    .put(new JSONObject().put("role", "user").put("content", "Write a short poem about spring")));
﻿
Request request = new Request.Builder()
    .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
    .addHeader("Authorization", "Bearer YOUR_API_KEY")
    .addHeader("Content-Type", "application/json")
    .post(RequestBody.create(body.toString(), MediaType.get("application/json")))
    .build();
﻿
EventSources.createFactory(httpClient).newEventSource(request, new EventSourceListener() {
    @Override
    public void onEvent(EventSource source, String id, String type, String data) {
        if ("[DONE]".equals(data)) return;
        try {
            JSONObject json = new JSONObject(data);
            String content = json.getJSONArray("choices").getJSONObject(0)
                .getJSONObject("delta").optString("content", "");
            if (!content.isEmpty()) System.out.print(content);
        } catch (JSONException ignored) {}
    }
});
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
body := map[string]interface{}{
    "model":      "minimax-m2.7",
    "messages":   []map[string]string{{"role": "user", "content": "Write a short poem about spring"}},
    "max_tokens": 512,
    "stream":     true,
}
data, _ := json.Marshal(body)
﻿
req, _ := http.NewRequest("POST",
    "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
    bytes.NewBuffer(data))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")
﻿
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
﻿
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
    line := scanner.Text()
    if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {
        continue
    }
    var chunk map[string]interface{}
    json.Unmarshal([]byte(strings.TrimPrefix(line, "data: ")), &chunk)
    choices := chunk["choices"].([]interface{})
    delta := choices[0].(map[string]interface{})["delta"].(map[string]interface{})
    if content, ok := delta["content"].(string); ok {
        fmt.Print(content)
    }
}
System Prompt
Use system role messages to set the model's behavioral instructions and background information.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},
      {"role": "user", "content": "How to read a CSV file"}
    ],
    "max_tokens": 512
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[
        {
            "role": "system",
            "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",
        },
        {"role": "user", "content": "How to read a CSV file"},
    ],
    max_tokens=512,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const response = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [
    {
      role: "system",
      "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses.",
    },
    { role: "user", content: "How to read a CSV file" },
  ],
  max_tokens: 512,
});
console.log(response.choices[0].message.content);
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 512);
body.put("messages", new JSONArray()
    .put(new JSONObject().put("role", "system")
        .put("content", "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."))
    .put(new JSONObject().put("role", "user")
        .put("content", "How to read a CSV file")));
// ... The request sending code is the same as above.
body := map[string]interface{}{
    "model": "minimax-m2.7",
    "messages": []map[string]string{
        {"role": "system", "content": "You are a professional Python programming assistant, answering only Python-related questions with concise and clear responses."},
        {"role": "user", "content": "How to read a CSV file"},
    },
    "max_tokens": 512,
}
// ... The request sending code is the same as in the quick start.
Multi-turn Conversation
Pass the conversation history into the messages array to enable multi-turn dialogue with context memory.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},
      {"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},
      {"role": "user", "content": "Do you remember my name and hobbies?"}
    ],
    "max_tokens": 256
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
# Maintain Conversation History
conversation = [
    {"role": "system", "content": "You are a friendly AI assistant."},
]
﻿
def chat(user_input):
    conversation.append({"role": "user", "content": user_input})
    response = client.chat.completions.create(
        model="minimax-m2.7",
        messages=conversation,
        max_tokens=1024,
    )
    reply = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": reply})
    return reply
﻿
print(chat("My name is Xiaoming, and I like playing basketball."))
print(chat("Do you remember my name and hobbies?"))
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const conversation = [
  {"role": "system", "content": "You are a friendly AI assistant."},
];
﻿
async function chat(userInput) {
  conversation.push({ role: "user", content: userInput });
  const response = await client.chat.completions.create({
    model: "minimax-m2.7",
    messages: conversation,
    max_tokens: 1024,
  });
  const reply = response.choices[0].message.content;
  conversation.push({ role: "assistant", content: reply });
  return reply;
}
﻿
console.log(await chat("My name is Xiaoming, and I like playing basketball."));
console.log(await chat("Do you remember my name and hobbies?"));
JSONArray messages = new JSONArray();
messages.put(new JSONObject().put("role", "system").put("content", "You are a friendly AI assistant."));
messages.put(new JSONObject().put("role", "user").put("content", "My name is Xiaoming, and I like playing basketball."));
messages.put(new JSONObject().put("role", "assistant").put("content", "Hello, Xiaoming! Playing basketball is a great sport."));
messages.put(new JSONObject().put("role", "user").put("content", "Do you remember my name and hobbies?"));
﻿
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("messages", messages);
body.put("max_tokens", 1024);
// ... The request sending code is the same as above.
body := map[string]interface{}{
    "model": "minimax-m2.7",
    "messages": []map[string]string{
        {"role": "system", "content": "You are a friendly AI assistant."},
        {"role": "user", "content": "My name is Xiaoming, and I like playing basketball."},
        {"role": "assistant", "content": "Hello, Xiaoming! Playing basketball is a great sport."},
        {"role": "user", "content": "Do you remember my name and hobbies?"},
    },
    "max_tokens": 1024,
}
// ... The request sending code is the same as in the quick start.
Function Calling (Tool Invocation)
Function Calling enables models to invoke external tools to obtain real-time data. The model itself does not execute functions. Instead, it returns the function names and parameters to be invoked. User code then executes them and returns the results to the model, which ultimately generates a natural language response.
Calling Process:
1. When a user asks a question, the model returns tool_calls (which contain the function name and parameters).
2. User code executes the function, and the result is returned as a role: tool message.
3. The model generates the final natural language response based on the function result.
cURL
Python
Node.js
Java
Go
# Round 1: Send the Question + Tool Definitions
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "What is the weather like in Beijing today?"}
    ],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Obtain weather information for a specified city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name, such as Beijing"}
          },
          "required": ["city"]
        }
      }
    }]
  }'
﻿
# Round 2: Return the Tool Execution Result (Replace tool_call_id with the actual returned id)
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "What is the weather like in Beijing today?"}
      {"role": "assistant", "tool_calls": [{"id": "call_xxx", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"Beijing\\"}"}}]},
      {"role": "tool", "tool_call_id": "call_xxx", "content": "Sunny, temperature 28°C, humidity 50%"}
    ],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Obtain weather information for a specified city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
# Define Tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Obtain weather information for a specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name, such as Beijing"}
                },
                "required": ["city"],
            },
        },
    }
]
﻿
# Round 1: Send the Question
messages = [{"role": "user", "content": "What is the weather like in Beijing today?"}]
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=messages,
    tools=tools,
)
assistant_message = response.choices[0].message
﻿
# The Model Initiates a Tool Call
if response.choices[0].finish_reason == "tool_calls":
    tool_call = assistant_message.tool_calls[0]
    print(f"Model calls tool: {tool_call.function.name}, parameters: {tool_call.function.arguments}")
﻿
    # Execute the Tool (This is a simulated return)
    tool_result = "Sunny, temperature 28°C, humidity 50%"
﻿
    # Round 2: Return the Tool Result to the Model
    messages.append(assistant_message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": tool_result,
    })
﻿
    final_response = client.chat.completions.create(
        model="minimax-m2.7",
        messages=messages,
        tools=tools,
    )
    print(final_response.choices[0].message.content)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      "description": "Obtain weather information for a specified city",
      parameters: {
        type: "object",
        properties: {
          "city": {"type": "string", "description": "City name, such as Beijing"}
        },
        required: ["city"],
      },
    },
  },
];
﻿
// Round 1
const messages = [{ role: "user", content: "What is the weather like in Beijing today?" }];
const response1 = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages,
  tools,
});
﻿
const assistantMsg = response1.choices[0].message;
if (response1.choices[0].finish_reason === "tool_calls") {
  const toolCall = assistantMsg.tool_calls[0];
  console.log(`Tool call: ${toolCall.function.name}, parameters: ${toolCall.function.arguments}`);
﻿
  const toolResult = "Sunny, temperature 28°C, humidity 50%";
  messages.push(assistantMsg);
  messages.push({ role: "tool", tool_call_id: toolCall.id, content: toolResult });
﻿
  const response2 = await client.chat.completions.create({
    model: "minimax-m2.7",
    messages,
    tools,
  });
  console.log(response2.choices[0].message.content);
}
JSONObject toolFunc = new JSONObject()
    .put("name", "get_weather")
    .put("description", "Obtain weather information for a specified city")
    .put("parameters", new JSONObject()
        .put("type", "object")
        .put("properties", new JSONObject()
            .put("city", new JSONObject().put("type", "string").put("description", "City name")))
        .put("required", new JSONArray().put("city")));
﻿
JSONArray tools = new JSONArray()
    .put(new JSONObject().put("type", "function").put("function", toolFunc));
﻿
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("messages", new JSONArray()
    .put(new JSONObject().put("role", "user").put("content", "What is the weather like in Beijing today?")));
body.put("tools", tools);
// ... Send the request, parse tool_calls, execute the tool, and construct the second-round request
body := map[string]interface{}{
    "model": "minimax-m2.7",
    "messages": []map[string]string{
        {"role": "user", "content": "What is the weather like in Beijing today?"}
    },
    "tools": []map[string]interface{}{{
        "type": "function",
        "function": map[string]interface{}{
            "name":        "get_weather",
            "description": "Obtain weather information for a specified city",
            "parameters": map[string]interface{}{
                "type": "object",
                "properties": map[string]interface{}{
                    "city": map[string]string{"type": "string", "description": "City name"},
                },
                "required": []string{"city"},
            },
        },
    }},
}
// ... Send the request, parse tool_calls, and construct the second-round request
Thinking Mode
MiniMax M2.7 / M2.5 supports controlling whether to enable thinking mode using the thinking parameter, without switching the model ID. After this mode is enabled, the model performs internal reasoning first and then provides the final answer, making it suitable for complex tasks that require precise reasoning.
thinking Parameter Description
Field
Type
Default Value
Value Range
Description
type
string
"enabled"
"enabled" / "disabled"
Controls the thinking mode switch.
Enabling or Disabling Thinking
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}
    ],
    "max_tokens": 2048,
    "thinking": {"type": "enabled"}
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"}],
    max_tokens=2048,
    extra_body={"thinking": {"type": "enabled"}},
)
﻿
msg = response.choices[0].message
﻿
# Obtain the reasoning process (a field exclusive to thinking mode)
reasoning = getattr(msg, "reasoning_content", None)
if reasoning:
    print("=== Reasoning Process ===")
    print(reasoning)
﻿
print("=== Final Answer ===")
print(msg.content)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const response = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [{ role: "user", content: "Solve the equation x^2 - 5x + 6 = 0" }],
  max_tokens: 2048,
  // @ts-ignore - thinking is an extension field
  thinking: { type: "enabled" },
});
﻿
const msg = response.choices[0].message;
const reasoning = (msg as any).reasoning_content;
if (reasoning) {
  console.log("=== Reasoning Process ===");
  console.log(reasoning);
}
console.log("=== Final Answer ===");
console.log(msg.content);
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 2048);
body.put("messages", new JSONArray()
    .put(new JSONObject().put("role", "user").put("content", "Solve the equation x^2 - 5x + 6 = 0")));
body.put("thinking", new JSONObject().put("type", "enabled"));
﻿
// ... Send the request
try (Response response = httpClient.newCall(request).execute()) {
    JSONObject result = new JSONObject(response.body().string());
    JSONObject message = result.getJSONArray("choices")
        .getJSONObject(0).getJSONObject("message");
    String reasoning = message.optString("reasoning_content", "");
    String content = message.getString("content");
    System.out.println("Reasoning Process: " + reasoning);
    System.out.println("Final Answer: " + content);
}
body := map[string]interface{}{
    "model":      "minimax-m2.7",
    "max_tokens": 2048,
    "messages": []map[string]string{
        {"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0"},
    },
    "thinking": map[string]string{"type": "enabled"},
}
// ... Send the request and parse the reasoning_content and content fields from the response.
Response Structure Examples
After thinking mode is enabled, the reasoning_content field is included in the response's message:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "reasoning_content": "I need to solve the quadratic equation x^2 - 5x + 6 = 0.\\nFactorization: (x-2)(x-3) = 0\\nTherefore, x = 2 or x = 3.",
      "content": "The solution to the equation x² - 5x + 6 = 0 is: **x = 2** or **x = 3**"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "completion_tokens": 120,
    "completion_tokens_details": {
      "reasoning_tokens": 80
    }
  }
}
Streaming Thinking Output
When streaming output is enabled, the reasoning_content and content are both returned in incremental delta format and must be processed separately:
Python
Node.js
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
stream = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[{"role": "user", "content": "Analyze the advantages and challenges of quantum computing."}],
    max_tokens=2048,
    stream=True,
    extra_body={"thinking": {"type": "enabled"}},
)
﻿
print("=== Reasoning Process (Real-time) ===")
answer_started = False
﻿
for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
﻿
    reasoning_delta = getattr(delta, "reasoning_content", None)
    if reasoning_delta:
        print(reasoning_delta, end="", flush=True)
﻿
    if delta.content:
        if not answer_started:
            print("\\n\\n=== Final Answer (Real-time) ===")
            answer_started = True
        print(delta.content, end="", flush=True)
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const stream = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [{ role: "user", content: "Analyze the advantages and challenges of quantum computing." }],
  max_tokens: 2048,
  stream: true,
  // @ts-ignore
  thinking: { type: "enabled" },
});
﻿
let answerStarted = false;
process.stdout.write("=== Reasoning Process (Real-time) ===\\n");
﻿
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  if (!delta) continue;
﻿
  const reasoning = (delta as any).reasoning_content;
  if (reasoning) process.stdout.write(reasoning);
﻿
  if (delta.content) {
    if (!answerStarted) {
      process.stdout.write("\\n\\n=== Final Answer (Real-time) ===\\n");
      answerStarted = true;
    }
    process.stdout.write(delta.content);
  }
}
JSON Mode
Setting response_format to json_object ensures that the model outputs valid JSON strings, which is suitable for scenarios requiring structured data.
Note:
When using JSON mode, you must explicitly instruct the model to output JSON format in the system or user message; otherwise, the model may continuously output empty content.
cURL
Python
Node.js
Java
Go
curl https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer YOUR_API_KEY" \\
  -d '{
    "model": "minimax-m2.7",
    "messages": [
      {"role": "system", "content": "Return the result in JSON format."},
      {"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."}
    ],
    "max_tokens": 512,
    "response_format": {"type": "json_object"}
  }'
import json
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="minimax-m2.7",
    messages=[
        {"role": "system", "content": "Return the result in JSON format."},
        {
            "role": "user",
            "content": "Return information for three Chinese cities, each containing the name, province, and population fields.",
        },
    ],
    max_tokens=512,
    response_format={"type": "json_object"},
)
﻿
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, ensure_ascii=False, indent=2))
import OpenAI from "openai";
﻿
const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://tokenhub-intl.tencentcloudmaas.com/v1",
});
﻿
const response = await client.chat.completions.create({
  model: "minimax-m2.7",
  messages: [
    { role: "system", content: "Return the result in JSON format." },
    {
      role: "user",
      "content": "Return information for three Chinese cities, each containing the name, province, and population fields.",
    },
  ],
  max_tokens: 512,
  response_format: { type: "json_object" },
});
﻿
const result = JSON.parse(response.choices[0].message.content);
console.log(JSON.stringify(result, null, 2));
JSONObject body = new JSONObject();
body.put("model", "minimax-m2.7");
body.put("max_tokens", 512);
body.put("response_format", new JSONObject().put("type", "json_object"));
body.put("messages", new JSONArray()
    .put(new JSONObject().put("role", "system").put("content", "Return the result in JSON format."))
    .put(new JSONObject().put("role", "user").put("content",
        "Return information for three Chinese cities, each containing the name, province, and population fields.")));
// ... Send the request and parse the returned JSON string.
body := map[string]interface{}{
    "model":           "minimax-m2.7",
    "max_tokens":      512,
    "response_format": map[string]string{"type": "json_object"},
    "messages": []map[string]string{
        {"role": "system", "content": "Return the result in JSON format."},
        {"role": "user", "content": "Return information for three Chinese cities, each containing the name, province, and population fields."},
    },
}
// ... Send the request
Key Differences from Other Models
Level
MiniMax M3/M2.7/M2.5
OpenAI / Claude / GLM, etc
Reasoning Capability Switch
Explicitly controlled via the thinking.type parameter
Typically controlled by switching the model or a separate reasoning parameter.
Reasoning Process Field
Independently returned in the response as reasoning_content
Most models do not expose the reasoning process.
Access Reasoning Fields via OpenAI SDK
Must use hasattr / getattr
-
temperature Range
0-1, default 0.9
Typically 0-2
Recommended value for max_tokens
1024-4096 for general tasks; recommended ≥ 2048 for thinking mode
Typically, 1024-4096 is sufficient.
Context Window
200K tokens
Typically 128K tokens
Maximum Output
128K tokens
Typically 16K tokens
Writing back messages in Multi-turn Conversations
Only write back content, there is no need to write back reasoning_content
Typically, only content needs to be written back.
Recommended Parameters and Best Practices
Parameter / Practice
Recommendation
Description
max_tokens
1024-4096 for general tasks; recommended ≥ 2048 for thinking mode
Reasoning content and the answer share the token quota.
thinking
Omit for simple Q&A (enabled by default); use disabled to explicitly disable it.
Disabling the thinking mode can reduce token consumption and latency.
stream
Enable it for long text generation.
Avoid request timeouts and improve the response experience.
temperature
Use the default value of 0.9; for creative writing, increase to 0.95-1; for code generation, decrease to 0.2-0.5.
The temperature range for MiniMax is 0-1.
Multi-turn Conversation
Only return content; do not return reasoning_content.
Reduce token consumption.
Access Reasoning Fields via SDK
Use getattr(msg, "reasoning_content", None) for Python; use (msg as any).reasoning_content for Node.js.
This field is not defined in the OpenAI SDK type definitions.
Model Selection
minimax-m3 is the latest flagship version with enhanced capabilities; minimax-m2.7 is suitable for cost-sensitive scenarios.
-
Use Limits
Restriction Item
Description
temperature Range
The temperature range for MiniMax is 0-1, which differs from OpenAI's 0-2. Passing a value greater than 1 may cause an error.
Timeout risk
When thinking mode is enabled, the response time is longer. Use it with stream=true to avoid timeouts.
Thinking Mode and JSON Mode
It is not recommended to simultaneously enable thinking.type=enabled and response_format.type=json_object.
References
Language Model Invocation Overview: This TokenHub language model general invocation document contains general descriptions of BaseURL, API Key, multi-turn conversations, Function Calling, the Anthropic protocol, and more.
Ajuda e Suporte

Esta página foi útil?
Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.
comentários
tencent cloud

LLM Service TokenHub

MiniMax API Guide

Overview

Supported Models

Prerequisites

Quick Start

General Calling Examples

Basic Conversation

Streaming Output

System Prompt

Multi-turn Conversation

Function Calling (Tool Invocation)

Thinking Mode

thinking Parameter Description

Enabling or Disabling Thinking

Response Structure Examples

Streaming Thinking Output

JSON Mode

Key Differences from Other Models

Recommended Parameters and Best Practices

Use Limits

References

Ajuda e Suporte

Model ID	Type	Reasoning Capability	Context Window	Max Input	Max Output
`minimax-m3`	General Conversation Model	Supported	1M	1M	-
`minimax-m2.7`	General Conversation Model	Supported	200K	200K	128K
`minimax-m2.5`	General Conversation Model	Supported	200K	200K	128K
Field	Type	Default Value	Value Range	Description
`type`	string	`"enabled"`	`"enabled"` / `"disabled"`	Controls the thinking mode switch.
Level	MiniMax M3/M2.7/M2.5	OpenAI / Claude / GLM, etc
Reasoning Capability Switch	Explicitly controlled via the `thinking.type` parameter	Typically controlled by switching the model or a separate reasoning parameter.
Reasoning Process Field	Independently returned in the response as `reasoning_content`	Most models do not expose the reasoning process.
Access Reasoning Fields via OpenAI SDK	Must use `hasattr` / `getattr`	-
`temperature` Range	0-1, default 0.9	Typically 0-2
Recommended value for `max_tokens`	1024-4096 for general tasks; recommended ≥ 2048 for thinking mode	Typically, 1024-4096 is sufficient.
Context Window	200K tokens	Typically 128K tokens
Maximum Output	128K tokens	Typically 16K tokens
Writing back messages in Multi-turn Conversations	Only write back `content`, there is no need to write back `reasoning_content`	Typically, only `content` needs to be written back.
Parameter / Practice	Recommendation	Description
`max_tokens`	1024-4096 for general tasks; recommended ≥ 2048 for thinking mode	Reasoning content and the answer share the token quota.
`thinking`	Omit for simple Q&A (enabled by default); use `disabled` to explicitly disable it.	Disabling the thinking mode can reduce token consumption and latency.
`stream`	Enable it for long text generation.	Avoid request timeouts and improve the response experience.
`temperature`	Use the default value of 0.9; for creative writing, increase to 0.95-1; for code generation, decrease to 0.2-0.5.	The temperature range for MiniMax is 0-1.
Multi-turn Conversation	Only return `content`; do not return `reasoning_content`.	Reduce token consumption.
Access Reasoning Fields via SDK	Use `getattr(msg, "reasoning_content", None)` for Python; use `(msg as any).reasoning_content` for Node.js.	This field is not defined in the OpenAI SDK type definitions.
Model Selection	`minimax-m3` is the latest flagship version with enhanced capabilities; `minimax-m2.7` is suitable for cost-sensitive scenarios.	-
Restriction Item	Description
`temperature` Range	The temperature range for MiniMax is 0-1, which differs from OpenAI's 0-2. Passing a value greater than 1 may cause an error.
Timeout risk	When thinking mode is enabled, the response time is longer. Use it with `stream=true` to avoid timeouts.
Thinking Mode and JSON Mode	It is not recommended to simultaneously enable `thinking.type=enabled` and `response_format.type=json_object`.