Kimi API Guide

Download
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-11 18:02:54
The Kimi series of models are compatible with the OpenAI / Anthropic protocols and can be called directly via the OpenAI SDK or standard HTTP APIs. This document provides invocation examples in five programming languages (cURL, Python, Node.js, Java, Go) for common scenarios and explains Kimi's special features, such as its reasoning mode and multimodal capabilities.
Note:
Using the TokenHub gateway as an example, all calls are accessed via https://tokenhub-intl.tencentcloudmaas.com/v1 and authenticated using a TokenHub-specific API Key.
Supported Models
TokenHub currently supports the following Kimi models (for specifics, refer to the model list):
Model ID
Type
Reasoning Capability
Visual Capability
Video Capability
kimi-k2.6
General Conversation Model
Configurable (Enabled by default)
Supported
Supported
kimi-k2.5
General Conversation Model
Configurable (Enabled by default)
Supported
Not supported.
Note:
Kimi K2.6 / K2.5 functions as both a chat model and a reasoning model. Unlike other providers, you do not need to switch model IDs between a standard model and a reasoning model. You can simply control whether to enable reasoning capabilities via the thinking parameter.
Key Differences from Other Models
Writing back
Kimi K2.6 / K2.5
OpenAI / Claude / GLM, etc
Reasoning Capability Switch
Explicitly controlled via the thinking.type parameter
Typically controlled by switching the model or a separate reasoning parameter.
Reasoning Process Field
Independently returned in the response as reasoning_content
Most models do not expose the reasoning process.
Access Reasoning Fields via OpenAI SDK
Must use hasattr / getattr
-
Retaining Reasoning Across Multi-turn Conversations
Controls whether to pass through historical reasoning_content via thinking.keep.
-
temperature
Fixed at 1.0 in thinking mode and at 0.6 in non-thinking mode.
Freely adjustable between 0 and 2 by default.
Recommended value for max_tokens
Greater than or equal to 16000 (shared quota for reasoning + response)
Typically 1024 to 4096 is sufficient.
Multimodal Image Input
Supports two methods: Base64 encoding and public network URL direct links
Generally support URL direct links.
Video Input
Only supported by K2.6.
Most models do not support it.
Writing back messages in Multi-turn Conversations
When thinking is enabled, bothcontent and reasoning_content must be written back.
Typically only content need to be written back.
General Calling Examples
The following are common invocation scenarios for Kimi models, all using the OpenAI-compatible protocol. Unless otherwise specified, the example does not enable reasoning, allowing you to first run the simplest invocation. For how to enable reasoning, see Kimi Reasoning Mode.
Basic Conversation
Please replace YOUR_API_KEY with the API Key you created.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "user", "content": "Hello, please introduce yourself"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "user", "content": "Hello, please introduce yourself"},
    ],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [
    { role: 'user', content: 'Hello, please introduce yourself' },
  ],
});
console.log(response.choices[0].message.content);
// Using the OpenAI-compatible protocol, call the HTTP API directly with OkHttp
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class BasicChat {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", Arrays.asList(
            Map.of("role", "user", "content", "Hello, please introduce yourself")
        ));
﻿
        RequestBody requestBody = RequestBody.create(
            new Gson().toJson(body),
            MediaType.parse("application/json")
        );
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(requestBody)
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body := map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]string{
            {"role": "user", "content": "Hello, please introduce yourself"},
        },
    }
    payload, _ := json.Marshal(body)
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(payload))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
Streaming Output
Enable stream: true to obtain responses in a streaming manner, which facilitates the display of a typewriter effect and prevents long responses from triggering gateway timeouts.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Please introduce yourself in one sentence"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Please introduce yourself in one sentence"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const stream = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [{ role: 'user', content: 'Please introduce yourself in one sentence' }],
  stream: true,
});
﻿
for await (const chunk of stream) {
  const content = chunk.choices?.[0]?.delta?.content;
  if (content) process.stdout.write(content);
}
import okhttp3.*;
import okhttp3.sse.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class StreamChat {
    public static void main(String[] args) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("stream", true);
        body.put("messages", List.of(
            Map.of("role", "user", "content", "Please introduce yourself in one sentence")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .header("Content-Type", "application/json")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        EventSources.createFactory(new OkHttpClient()).newEventSource(request,
            new EventSourceListener() {
                @Override public void onEvent(EventSource es, String id, String type, String data) {
                    System.out.println(data);
                }
            });
    }
}
package main
﻿
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model":  "kimi-k2.6",
        "stream": true,
        "messages": []map[string]string{
            {"role": "user", "content": "Please introduce yourself in one sentence"},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            fmt.Println(strings.TrimPrefix(line, "data: "))
        }
    }
}
System Prompt
You can set global instructions, personas, response styles, and more for the model through system role messages.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "system", "content": "You are Kimi, a rigorous physics teaching assistant, and your answers should incorporate simple analogies."},
      {"role": "user", "content": "Please explain what quantum entanglement is"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {"role": "system", "content": "You are Kimi, a rigorous physics teaching assistant, and your answers should incorporate simple analogies."},
        {"role": "user", "content": "Please explain what quantum entanglement is"},
    ],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [
    { role: 'system', content: 'You are Kimi, a rigorous physics teaching assistant, and your answers should incorporate simple analogies.' },
    { role: 'user', content: 'Please explain what quantum entanglement is' },
  ],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class SystemPromptChat {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", List.of(
            Map.of("role", "system", "content", "You are Kimi, a rigorous physics teaching assistant, and your answers should incorporate simple analogies."),
            Map.of("role", "user", "content", "Please explain what quantum entanglement is")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]string{
            {"role": "system", "content": "You are Kimi, a rigorous physics teaching assistant, and your answers should incorporate simple analogies."},
            {"role": "user", "content": "Please explain what quantum entanglement is"},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
Multi-Turn Conversation (Basic Version)
The Kimi API is stateless, similar to other mainstream models. You must pass the complete messages history in each request. The following example does not enable reasoning, so only the examplecontent field needs to be written back. If reasoning is enabled, the reasoning_content must also be written back. For details, see Multi-turn Conversation Writing Back reasoning_content.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "system", "content": "You are Kimi."},
      {"role": "user", "content": "Recommend a popular science book"},
      {"role": "assistant", "content": "I recommend 'A Brief History of Time'."},
      {"role": "user", "content": "Recommend another advanced one"}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
messages = [
    {"role": "system", "content": "You are Kimi."},
    {"role": "user", "content": "Recommend a popular science book"},
    {"role": "assistant", "content": "I recommend 'A Brief History of Time'."},
    {"role": "user", "content": "Recommend another advanced one"}
]
response = client.chat.completions.create(model="kimi-k2.6", messages=messages)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const messages = [
  { role: 'system', content: 'You are Kimi.' },
  {"role": "user", "content": "Recommend a popular science book"},
  {"role": "assistant", "content": "I recommend 'A Brief History of Time'."},
  {"role": "user", "content": "Recommend another advanced one"}
];
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages,
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class MultiTurnChat {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", List.of(
            Map.of("role", "system", "content", "You are Kimi."),
            Map.of("role", "user", "content", "Recommend a popular science book"),
            Map.of("role", "assistant", "content", "I recommend 'A Brief History of Time'."),
            Map.of("role", "user", "content", "Recommend another advanced one")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]string{
            {"role": "system", "content": "You are Kimi."},
            {"role": "user", "content": "Recommend a popular science book"},
            {"role": "assistant", "content": "I recommend 'A Brief History of Time'."},
            {"role": "user", "content": "Recommend another advanced one"}
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
Function Calling (Tool Invocation)
Kimi supports the standard OpenAI Function Calling protocol. The following example demonstrates how to register a weather-checking tool and, after the model decides to invoke the tool, backfill the tool's result to the model.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "messages": [
      {"role": "user", "content": "What is the weather like in Beijing today?"}
    ],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Query the weather for a specified city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
        }
      }
    }],
    "tool_choice": "auto"
  }'
import json
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Query the weather for a specified city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string", "description": "City name"}},
            "required": ["city"],
        },
    },
}]
﻿
messages = [{"role": "user", "content": "What is the weather like in Beijing today?"}]
﻿
# Round 1: The Model Decides Whether to Call the Tool
resp = client.chat.completions.create(
    model="kimi-k2.6", messages=messages, tools=tools, tool_choice="auto",
)
msg = resp.choices[0].message
messages.append(msg.model_dump(exclude_none=True))
﻿
# If the model chooses to call a tool, execute the tool and backfill the result
if msg.tool_calls:
    for call in msg.tool_calls:
        args = json.loads(call.function.arguments)
        # Replace with actual business logic here
        result = {"city": args["city"], "temperature": "22°C", "weather": "Sunny"}
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result, ensure_ascii=False),
        })
﻿
    # Round 2: Return the Tool Result to the Model and Obtain the Final Response
    final = client.chat.completions.create(model="kimi-k2.6", messages=messages, tools=tools)
    print(final.choices[0].message.content)
else:
    print(msg.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const tools = [{
  type: 'function',
  function: {
    name: 'get_weather',
    description: 'Query the weather for a specified city',
    parameters: {
      type: 'object',
      properties: { city: { type: 'string', description: 'City name' } },
      required: ['city'],
    },
  },
}];
﻿
const messages = [{ role: 'user', content: 'What is the weather like in Beijing today?' }];
﻿
const resp = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages,
  tools,
  tool_choice: 'auto',
});
const msg = resp.choices[0].message;
messages.push(msg);
﻿
if (msg.tool_calls) {
  for (const call of msg.tool_calls) {
    const args = JSON.parse(call.function.arguments);
    // Replace with actual business logic here
    const result = { city: args.city, temperature: '22°C', weather: 'Sunny' };
    messages.push({
      role: 'tool',
      tool_call_id: call.id,
      content: JSON.stringify(result),
    });
  }
  const final = await client.chat.completions.create({
    model: 'kimi-k2.6',
    messages,
    tools,
  });
  console.log(final.choices[0].message.content);
} else {
  console.log(msg.content);
}
import okhttp3.*;
import com.google.gson.*;
import java.util.*;
﻿
public class FunctionCallingDemo {
    static final String URL = "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions";
    static final String API_KEY = "YOUR_API_KEY";
    static final OkHttpClient HTTP = new OkHttpClient();
    static final Gson GSON = new Gson();
﻿
    static String chat(List<Map<String, Object>> messages, List<Map<String, Object>> tools) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", messages);
        body.put("tools", tools);
        body.put("tool_choice", "auto");
﻿
        Request req = new Request.Builder()
            .url(URL)
            .header("Authorization", "Bearer " + API_KEY)
            .post(RequestBody.create(GSON.toJson(body), MediaType.parse("application/json")))
            .build();
        try (Response resp = HTTP.newCall(req).execute()) {
            return resp.body().string();
        }
    }
﻿
    public static void main(String[] args) throws Exception {
        List<Map<String, Object>> tools = List.of(Map.of(
            "type", "function",
            "function", Map.of(
                "name", "get_weather",
                "description", "Query the weather for a specified city",
                "parameters", Map.of(
                    "type", "object",
                    "properties", Map.of("city", Map.of("type", "string", "description", "City name")),
                    "required", List.of("city")
                )
            )
        ));
﻿
        List<Map<String, Object>> messages = new ArrayList<>();
        messages.add(Map.of("role", "user", "content", "What is the weather like in Beijing today?"));
﻿
        // Round 1: The model decides whether to call the tool.
        String r1 = chat(messages, tools);
        JsonObject msg = JsonParser.parseString(r1).getAsJsonObject()
            .getAsJsonArray("choices").get(0).getAsJsonObject()
            .getAsJsonObject("message");
        messages.add(GSON.fromJson(msg, Map.class));
﻿
        if (msg.has("tool_calls")) {
            for (JsonElement el : msg.getAsJsonArray("tool_calls")) {
                JsonObject call = el.getAsJsonObject();
                JsonObject argsObj = JsonParser.parseString(
                    call.getAsJsonObject("function").get("arguments").getAsString()
                ).getAsJsonObject();
                // Replace with actual business logic here
                Map<String, String> result = Map.of(
                    "city", argsObj.get("city").getAsString(),
                    "temperature", "22°C",
                    "weather", "Sunny"
                );
                messages.add(Map.of(
                    "role", "tool",
                    "tool_call_id", call.get("id").getAsString(),
                    "content", GSON.toJson(result)
                ));
            }
            // Round 2: Return the Tool Result to the Model
            System.out.println(chat(messages, tools));
        } else {
            System.out.println(msg.get("content").getAsString());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
const (
    URL    = "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions"
    APIKEY = "YOUR_API_KEY"
)
﻿
func chat(messages []map[string]interface{}, tools []map[string]interface{}) (map[string]interface{}, error) {
    body, _ := json.Marshal(map[string]interface{}{
        "model":       "kimi-k2.6",
        "messages":    messages,
        "tools":       tools,
        "tool_choice": "auto",
    })
    req, _ := http.NewRequest("POST", URL, bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer "+APIKEY)
    req.Header.Set("Content-Type", "application/json")
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    data, _ := io.ReadAll(resp.Body)
    var out map[string]interface{}
    json.Unmarshal(data, &out)
    return out, nil
}
﻿
func main() {
    tools := []map[string]interface{}{{
        "type": "function",
        "function": map[string]interface{}{
            "name":        "get_weather",
            "description": "Query the weather for a specified city",
            "parameters": map[string]interface{}{
                "type": "object",
                "properties": map[string]interface{}{
                    "city": map[string]string{"type": "string", "description": "City name"},
                },
                "required": []string{"city"},
            },
        },
    }}
﻿
    messages := []map[string]interface{}{
        {"role": "user", "content": "What is the weather like in Beijing today?"},
    }
﻿
    // Round 1: The model decides whether to call the tool.
    r1, _ := chat(messages, tools)
    msg := r1["choices"].([]interface{})[0].(map[string]interface{})["message"].(map[string]interface{})
    messages = append(messages, msg)
﻿
    if calls, ok := msg["tool_calls"].([]interface{}); ok {
        for _, c := range calls {
            call := c.(map[string]interface{})
            argsStr := call["function"].(map[string]interface{})["arguments"].(string)
            var args map[string]string
            json.Unmarshal([]byte(argsStr), &args)
            // Replace with actual business logic here
            result, _ := json.Marshal(map[string]string{
                "city":        args["city"],
                "temperature": "22°C",
                "weather": "Sunny",
            })
            messages = append(messages, map[string]interface{}{
                "role":         "tool",
                "tool_call_id": call["id"],
                "content":      string(result),
            })
        }
        // Round 2: Return the Tool Result to the Model
        r2, _ := chat(messages, tools)
        fmt.Printf("%+v\\n", r2)
    } else {
        fmt.Println(msg["content"])
    }
}
Kimi Thinking Mode
Kimi K2.6 / K2.5 functions as both a chat model and a reasoning model. You can enable or disable its reasoning capabilities via the thinking field, which is its key distinction from models like OpenAI / GLM.
1. thinking Parameter
The thinking field resides at the top level of the request body and has the following structure:
"thinking": {
    "type": "enabled",
    "keep": "all"
}
Field
Type
Default Value
Value
Description
type
string
"enabled"
"enabled" / "disabled"
Whether thinking capability is enabled for the current request.
keep
string | null
null
"all" / not passed
Whether to pass through historical reasoning_content in multi-turn conversations.
Note:
thinking is not a standard OpenAI field. When using the official SDK, you must pass it through extra_body (Python) or directly at the top level (Node.js). For direct HTTP calls, place it at the top level of the request body.
2. reasoning_content Field
After thinking is enabled, a reasoning_content field, which is at the same level as content, is added to the response message to carry the model's reasoning process:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "reasoning_content": "First, we need to analyze...",
      "content": "The final answer is ..."
    }
  }]
}
OpenAI SDK Access Restrictions (Important)
The ChatCompletionMessage / ChoiceDelta types in the official OpenAI SDK do not directly declare the reasoning_content attribute. Therefore, you cannot directly access it via obj.reasoning_content and must use the following method:
# ❌ Error
content = message.reasoning_content
﻿
# ✅ Correct
if hasattr(message, "reasoning_content"):
    content = getattr(message, "reasoning_content")
If you call the API directly via HTTP or parse the JSON using a general-purpose framework like requests/httpx, this restriction does not apply, and you can directly read the sibling field.
3. Streaming Thought Output
When thinking is enabled, it is strongly recommended to use streaming calls (stream: true):
Avoid timeouts: The reasoning content can be lengthy, leading to long overall response times. Non-streaming calls are prone to triggering gateway timeouts.
Clear order: In streaming mode, the reasoning_content is always fully output before the content, allowing the UI to distinguish between the thinking and answering states.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "max_tokens": 32768,
    "stream": true,
    "thinking": {"type": "enabled"},
    "messages": [
      {"role": "user", "content": "Explain the Fourier transform in one sentence."}
    ]
  }'
# The response is an SSE stream: each `data:` line contains a chunk,
# delta.reasoning_content always appears before delta.content
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{"role": "user", "content": "Explain the Fourier transform in one sentence."}],
    max_tokens=32768,
    stream=True,
    extra_body={"thinking": {"type": "enabled"}},
)
﻿
thinking = False
for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    # Thinking Phase
    if hasattr(delta, "reasoning_content") and getattr(delta, "reasoning_content"):
        if not thinking:
            print("=== Start Thinking ===")
            thinking = True
        print(getattr(delta, "reasoning_content"), end="", flush=True)
    # Answering Phase
    if delta.content:
        if thinking:
            print("\\n=== Thinking Completed ===")
            thinking = False
        print(delta.content, end="", flush=True)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
// The Node.js SDK does not natively support the thinking / extra_body fields, so you can directly expand them to the top level.
// Note: If using TypeScript, append `as any` after the last object to bypass type checking.
const stream = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [{ role: 'user', content: 'Explain the Fourier transform in one sentence.' }],
  max_tokens: 32768,
  stream: true,
  thinking: { type: 'enabled' },
});
﻿
let thinking = false;
for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta;
  if (!delta) continue;
  if (delta.reasoning_content) {
    if (!thinking) { console.log('=== Start Thinking ==='); thinking = true; }
    process.stdout.write(delta.reasoning_content);
  }
  if (delta.content) {
    if (thinking) { console.log('\\n=== Thinking Completed ==='); thinking = false; }
    process.stdout.write(delta.content);
  }
}
import okhttp3.*;
import okhttp3.sse.*;
import com.google.gson.*;
import java.util.*;
﻿
public class ThinkingStream {
    public static void main(String[] args) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("max_tokens", 32768);
        body.put("stream", true);
        body.put("thinking", Map.of("type", "enabled"));
        body.put("messages", List.of(
            Map.of("role", "user", "content", "Explain the Fourier transform in one sentence.")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .header("Content-Type", "application/json")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        EventSources.createFactory(new OkHttpClient()).newEventSource(request,
            new EventSourceListener() {
                @Override public void onEvent(EventSource es, String id, String type, String data) {
                    if ("[DONE]".equals(data)) return;
                    JsonObject delta = JsonParser.parseString(data).getAsJsonObject()
                        .getAsJsonArray("choices").get(0).getAsJsonObject()
                        .getAsJsonObject("delta");
                    if (delta.has("reasoning_content")) {
                        System.out.print(delta.get("reasoning_content").getAsString());
                    }
                    if (delta.has("content") && !delta.get("content").isJsonNull()) {
                        System.out.print(delta.get("content").getAsString());
                    }
                }
            });
    }
}
package main
﻿
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model":      "kimi-k2.6",
        "max_tokens": 32768,
        "stream":     true,
        "thinking":   map[string]string{"type": "enabled"},
        "messages": []map[string]string{
            {"role": "user", "content": "Explain the Fourier transform in one sentence."},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    scanner := bufio.NewScanner(resp.Body)
    thinking := false
    for scanner.Scan() {
        line := scanner.Text()
        if !strings.HasPrefix(line, "data: ") {
            continue
        }
        data := strings.TrimPrefix(line, "data: ")
        if data == "[DONE]" {
            break
        }
        var chunk map[string]interface{}
        if err := json.Unmarshal([]byte(data), &chunk); err != nil {
            continue
        }
        choices, _ := chunk["choices"].([]interface{})
        if len(choices) == 0 {
            continue
        }
        delta, _ := choices[0].(map[string]interface{})["delta"].(map[string]interface{})
        if rc, ok := delta["reasoning_content"].(string); ok && rc != "" {
            if !thinking {
                fmt.Println("=== Start Thinking ===")
                thinking = true
            }
            fmt.Print(rc)
        }
        if c, ok := delta["content"].(string); ok && c != "" {
            if thinking {
                fmt.Println("\\n=== Thinking Completed ===")
                thinking = false
            }
            fmt.Print(c)
        }
    }
}
4. Preserved Thinking (Multi-Turn Preserved Thinking)
thinking.keep controls whether the historical turns' reasoning_content participates in the next round of reasoning:
Value
Meaning
Use Cases
Not passed / null (default)
The reasoning content from historical rounds is not passed through, resulting in a shorter context and lower cost.
General multi-turn conversation
"all"
Retains the reasoning process from historical rounds in full, enabling the model to continue its previous line of thought.
Complex multi-step reasoning, Agent tool calling, long-range code tasks
Note:
keep only affects whether the historical thinking is passed to the model, and does not affect whether the current turn generates thinking. It is recommended to use it in conjunction with type: "enabled" in scenarios requiring continuous reasoning.
5. Writing Back reasoning_content in Multi-Turn Conversations
When thinking is enabled, you must write back the reasoning_content and content returned by the previous API call together to the messages; otherwise, the model will lose the reasoning thread in subsequent turns.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "stream": true,
    "thinking": {"type": "enabled", "keep": "all"},
    "messages": [
      {"role": "system", "content": "You are Kimi."},
      {"role": "user", "content": "The first question..."},
      {
        "role": "assistant",
        "reasoning_content": "<reasoning_content returned by the previous API call>",
        "content": "<content returned by the previous API call>"
      },
      {"role": "user", "content": "Please continue the deduction based on the previous analysis."}
    ]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
messages = [
    {"role": "system", "content": "You are Kimi."},
    {"role": "user", "content": "The first question..."},
    {
        "role": "assistant",
        "reasoning_content": "<reasoning_content returned by the previous API call>",
        "content": "<content returned by the previous API call>",
    },
    {"role": "user", "content": "Please continue the deduction based on the previous analysis."},
]
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    stream=True,
    extra_body={"thinking": {"type": "enabled", "keep": "all"}},
)
﻿
for chunk in response:
    delta = chunk.choices[0].delta
    if getattr(delta, "reasoning_content", None):
        print(delta.reasoning_content, end="", flush=True)
    if delta.content:
        print(delta.content, end="", flush=True)
print()
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const messages = [
  { role: 'system', content: 'You are Kimi.' },
  { role: 'user', content: 'The first question...' },
  {
    role: 'assistant',
    reasoning_content: '<reasoning_content returned by the previous API call>',
    content: '<content returned by the previous API call>',
  },
  { role: 'user', content: 'Please continue the deduction based on the previous analysis.' },
];
﻿
const stream = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages,
  stream: true,
  thinking: { type: 'enabled', keep: 'all' },
});
﻿
for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta;
  if (!delta) continue;
  if (delta.reasoning_content) process.stdout.write(delta.reasoning_content);
  if (delta.content) process.stdout.write(delta.content);
}
console.log();
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class MultiTurnWithThinking {
    public static void main(String[] args) {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("stream", true);
        body.put("thinking", Map.of("type", "enabled", "keep", "all"));
        body.put("messages", List.of(
            Map.of("role", "system", "content", "You are Kimi."),
            Map.of("role", "user", "content", "The first question..."),
            Map.of(
                "role", "assistant",
                "reasoning_content", "<reasoning_content returned by the previous API call>",
                "content", "<content returned by the previous API call>"
            ),
            Map.of("role", "user", "content", "Please continue the deduction based on the previous analysis.")
        ));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute();
             java.io.BufferedReader r = new java.io.BufferedReader(
                 new java.io.InputStreamReader(response.body().byteStream()))) {
            String line;
            while ((line = r.readLine()) != null) {
                if (line.startsWith("data: ")) System.out.println(line.substring(6));
            }
        }
    }
}
package main
﻿
import (
    "bufio"
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
    "strings"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model":    "kimi-k2.6",
        "stream":   true,
        "thinking": map[string]string{"type": "enabled", "keep": "all"},
        "messages": []map[string]interface{}{
            {"role": "system", "content": "You are Kimi."},
            {"role": "user", "content": "The first question..."},
            {
                "role":              "assistant",
                "reasoning_content": "<reasoning_content returned by the previous API call>",
                "content": "<content returned by the previous API call>",
            },
            {"role": "user", "content": "Please continue the deduction based on the previous analysis."},
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        line := scanner.Text()
        if strings.HasPrefix(line, "data: ") {
            fmt.Println(strings.TrimPrefix(line, "data: "))
        }
    }
}
Multimodal Invocation
1. Image Input: Base64 Encoding
Suitable for local files / private network image scenarios:
cURL
Python
Node.js
Java
Go
# First, read the image as base64
IMAGE_B64=$(base64 -i image.jpg | tr -d '\\n')
﻿
# Use a temporary file to pass the body, avoiding the "Argument list too long" error triggered by an excessively large base64 string.
cat > /tmp/req.json <<EOF
{
  "model": "kimi-k2.6",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Please describe this picture."},
      {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,${IMAGE_B64}"}}
    ]
  }]
}
EOF
﻿
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d @/tmp/req.json
import base64
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
with open("image.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Please describe this picture"},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}},
        ],
    }],
)
print(response.choices[0].message.content)
import fs from 'node:fs';
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const imageB64 = fs.readFileSync('image.jpg').toString('base64');
﻿
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Please describe this picture' },
      { type: 'image_url', image_url: { url: `data:image/jpeg;base64,${imageB64}` } },
    ],
  }],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.nio.file.*;
import java.util.*;
﻿
public class ImageBase64Chat {
    public static void main(String[] args) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get("image.jpg"));
        String imageB64 = Base64.getEncoder().encodeToString(bytes);
﻿
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", List.of(Map.of(
            "role", "user",
            "content", List.of(
                Map.of("type", "text", "text", "Please describe this picture"),
                Map.of("type", "image_url", "image_url", Map.of(
                    "url", "data:image/jpeg;base64," + imageB64
                ))
            )
        )));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/base64"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "os"
)
﻿
func main() {
    img, _ := os.ReadFile("image.jpg")
    imageB64 := base64.StdEncoding.EncodeToString(img)
﻿
    body, _ := json.Marshal(map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]interface{}{
            {
                "role": "user",
                "content": []map[string]interface{}{
                    {"type": "text", "text": "Please describe this picture"},
                    {
                        "type": "image_url",
                        "image_url": map[string]string{
                            "url": "data:image/jpeg;base64," + imageB64,
                        },
                    },
                },
            },
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
2. Image Input: Public Network URL Direct Link
Suitable for images already hosted on CDN / COS, it can significantly reduce the request size:
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{
    "model": "kimi-k2.6",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Please describe this picture"},
        {"type": "image_url", "image_url": {"url": "https://www.gstatic.com/webp/gallery/1.jpg"}}
      ]
    }]
  }'
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Please describe this picture"},
            {"type": "image_url", "image_url": {"url": "https://www.gstatic.com/webp/gallery/1.jpg"}},
        ],
    }],
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Please describe this picture' },
      { type: 'image_url', image_url: { url: 'https://www.gstatic.com/webp/gallery/1.jpg' } },
    ],
  }],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;
﻿
public class ImageUrlChat {
    public static void main(String[] args) throws Exception {
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", List.of(Map.of(
            "role", "user",
            "content", List.of(
                Map.of("type", "text", "text", "Please describe this picture"),
                Map.of("type", "image_url", "image_url", Map.of("url", "https://www.gstatic.com/webp/gallery/1.jpg"))
            )
        )));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)
﻿
func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]interface{}{
            {
                "role": "user",
                "content": []map[string]interface{}{
                    {"type": "text", "text": "Please describe this picture"},
                    {
                        "type":      "image_url",
                        "image_url": map[string]string{"url": "https://www.gstatic.com/webp/gallery/1.jpg"},
                    },
                },
            },
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
Note:
When using a direct URL link, ensure that the URL is accessible via a public network and requires no authentication. For images on a private network or URLs with signature verification, you must first download and then upload them in Base64 format.
3. Video Input (Supported Only by K2.6)
For video input, which is similar to image input, simply encode it in Base64 and specify the data:video/<format>;base64,... format:
cURL
Python
Node.js
Java
Go
# First, read the video as base64
VIDEO_B64=$(base64 -i demo.mp4 | tr -d '\\n')
﻿
# Use a temporary file to pass the body, avoiding the "Argument list too long" error triggered by an excessively large base64 string.
cat > /tmp/req.json <<EOF
{
  "model": "kimi-k2.6",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Please summarize the video content"},
      {"type": "video_url", "video_url": {"url": "data:video/mp4;base64,${VIDEO_B64}"}}
    ]
  }]
}
EOF
﻿
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
  -H 'Authorization: Bearer YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d @/tmp/req.json
import base64
from openai import OpenAI
﻿
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)
﻿
with open("demo.mp4", "rb") as f:
    video_b64 = base64.b64encode(f.read()).decode()
﻿
response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Please summarize the video content"},
            {"type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_b64}"}},
        ],
    }],
)
print(response.choices[0].message.content)
import fs from 'node:fs';
import OpenAI from 'openai';
﻿
const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});
﻿
const videoB64 = fs.readFileSync('demo.mp4').toString('base64');
﻿
const response = await client.chat.completions.create({
  model: 'kimi-k2.6',
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'Please summarize the video content' },
      { type: 'video_url', video_url: { url: `data:video/mp4;base64,${videoB64}` } },
    ],
  }],
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.nio.file.*;
import java.util.*;
﻿
public class VideoChat {
    public static void main(String[] args) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get("demo.mp4"));
        String videoB64 = Base64.getEncoder().encodeToString(bytes);
﻿
        Map<String, Object> body = new HashMap<>();
        body.put("model", "kimi-k2.6");
        body.put("messages", List.of(Map.of(
            "role", "user",
            "content", List.of(
                Map.of("type", "text", "text", "Please summarize the video content"),
                Map.of("type", "video_url", "video_url", Map.of(
                    "url", "data:video/mp4;base64," + videoB64
                ))
            )
        )));
﻿
        Request request = new Request.Builder()
            .url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
            .header("Authorization", "Bearer YOUR_API_KEY")
            .post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
            .build();
﻿
        try (Response response = new OkHttpClient().newCall(request).execute()) {
            System.out.println(response.body().string());
        }
    }
}
package main
﻿
import (
    "bytes"
    "encoding/base64"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "os"
)
﻿
func main() {
    video, _ := os.ReadFile("demo.mp4")
    videoB64 := base64.StdEncoding.EncodeToString(video)
﻿
    body, _ := json.Marshal(map[string]interface{}{
        "model": "kimi-k2.6",
        "messages": []map[string]interface{}{
            {
                "role": "user",
                "content": []map[string]interface{}{
                    {"type": "text", "text": "Please summarize the video content"},
                    {
                        "type": "video_url",
                        "video_url": map[string]string{
                            "url": "data:video/mp4;base64," + videoB64,
                        },
                    },
                },
            },
        },
    })
﻿
    req, _ := http.NewRequest("POST",
        "https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
        bytes.NewBuffer(body))
    req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
    req.Header.Set("Content-Type", "application/json")
﻿
    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()
﻿
    data, _ := io.ReadAll(resp.Body)
    fmt.Println(string(data))
}
Note:
The size of a single request body is limited by the gateway (typically within 100MB). For oversized videos, compress them or extract key segments before uploading.
Recommended Parameters and Best Practices
Parameter / Practice
Recommendation
Description
max_tokens
Greater than or equal to 16000 (32768 recommended)
Reasoning + response share the max_tokens quota. If the quota is too small, the content is prone to truncation.
temperature
Do not explicitly set.
The K2.6 / K2.5 series uses 0.6 by default. Passing any other value returns a 400 error (invalid temperature: only 0.6 is allowed). It is recommended to omit this parameter entirely.
stream
Enable (recommended).
The output is longer in thinking mode. Streaming can improve the experience and avoid gateway timeouts.
Multimodal priority
URL direct links > Base64
For public network images, prioritize using URLs. For local files, use Base64. The body of a single request must not exceed 100 MB.
Writing back messages in Multi-turn Conversations
Write back the entire message when thinking is enabled.
both reasoning_content and content must be written back to messages together. Do not omit any fields.
Access Reasoning via OpenAI SDK
hasattr / getattr
Do not directly access .reasoning_content, otherwise an attribute access error occurs.
﻿
Bantuan dan Dukungan

Apakah halaman ini membantu?
Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.
masukan
tencent cloud

LLM Service TokenHub

Kimi API Guide

Supported Models

Key Differences from Other Models

General Calling Examples

Basic Conversation

Streaming Output

System Prompt

Multi-Turn Conversation (Basic Version)

Function Calling (Tool Invocation)

Kimi Thinking Mode

1. `thinking` Parameter

2. `reasoning_content` Field

OpenAI SDK Access Restrictions (Important)

3. Streaming Thought Output

4. Preserved Thinking (Multi-Turn Preserved Thinking)

5. Writing Back reasoning_content in Multi-Turn Conversations

Multimodal Invocation

1. Image Input: Base64 Encoding

2. Image Input: Public Network URL Direct Link

3. Video Input (Supported Only by K2.6)

Recommended Parameters and Best Practices

Bantuan dan Dukungan

Model ID	Type	Reasoning Capability	Visual Capability	Video Capability
`kimi-k2.6`	General Conversation Model	Configurable (Enabled by default)	Supported	Supported
`kimi-k2.5`	General Conversation Model	Configurable (Enabled by default)	Supported	Not supported.
Writing back	Kimi K2.6 / K2.5	OpenAI / Claude / GLM, etc
Reasoning Capability Switch	Explicitly controlled via the `thinking.type` parameter	Typically controlled by switching the model or a separate reasoning parameter.
Reasoning Process Field	Independently returned in the response as `reasoning_content`	Most models do not expose the reasoning process.
Access Reasoning Fields via OpenAI SDK	Must use `hasattr` / `getattr`	-
Retaining Reasoning Across Multi-turn Conversations	Controls whether to pass through historical `reasoning_content` via `thinking.keep`.	-
`temperature`	Fixed at 1.0 in thinking mode and at 0.6 in non-thinking mode.	Freely adjustable between 0 and 2 by default.
Recommended value for `max_tokens`	Greater than or equal to 16000 (shared quota for reasoning + response)	Typically 1024 to 4096 is sufficient.
Multimodal Image Input	Supports two methods: Base64 encoding and public network URL direct links	Generally support URL direct links.
Video Input	Only supported by K2.6.	Most models do not support it.
Writing back messages in Multi-turn Conversations	When thinking is enabled, both`content` and `reasoning_content` must be written back.	Typically only `content` need to be written back.
Field	Type	Default Value	Value	Description
`type`	string	`"enabled"`	`"enabled"` / `"disabled"`	Whether thinking capability is enabled for the current request.
`keep`	string \| null	`null`	`"all"` / not passed	Whether to pass through historical `reasoning_content` in multi-turn conversations.
Value	Meaning	Use Cases
Not passed / `null` (default)	The reasoning content from historical rounds is not passed through, resulting in a shorter context and lower cost.	General multi-turn conversation
`"all"`	Retains the reasoning process from historical rounds in full, enabling the model to continue its previous line of thought.	Complex multi-step reasoning, Agent tool calling, long-range code tasks
Parameter / Practice	Recommendation	Description
`max_tokens`	Greater than or equal to 16000 (32768 recommended)	Reasoning + response share the `max_tokens` quota. If the quota is too small, the content is prone to truncation.
`temperature`	Do not explicitly set.	The K2.6 / K2.5 series uses 0.6 by default. Passing any other value returns a 400 error (`invalid temperature: only 0.6 is allowed`). It is recommended to omit this parameter entirely.
`stream`	Enable (recommended).	The output is longer in thinking mode. Streaming can improve the experience and avoid gateway timeouts.
Multimodal priority	URL direct links > Base64	For public network images, prioritize using URLs. For local files, use Base64. The body of a single request must not exceed 100 MB.
Writing back messages in Multi-turn Conversations	Write back the entire message when thinking is enabled.	both reasoning_content and content must be written back to messages together. Do not omit any fields.
Access Reasoning via OpenAI SDK	`hasattr` / `getattr`	Do not directly access `.reasoning_content`, otherwise an attribute access error occurs.
tencent cloud

LLM Service TokenHub

Kimi API Guide

Supported Models

Key Differences from Other Models

General Calling Examples

Basic Conversation

Streaming Output

System Prompt

Multi-Turn Conversation (Basic Version)

Function Calling (Tool Invocation)

Kimi Thinking Mode

1. thinking Parameter

2. reasoning_content Field

OpenAI SDK Access Restrictions (Important)

3. Streaming Thought Output

4. Preserved Thinking (Multi-Turn Preserved Thinking)

5. Writing Back reasoning_content in Multi-Turn Conversations

Multimodal Invocation

1. Image Input: Base64 Encoding

2. Image Input: Public Network URL Direct Link

3. Video Input (Supported Only by K2.6)

Recommended Parameters and Best Practices

﻿

Bantuan dan Dukungan

1. `thinking` Parameter

2. `reasoning_content` Field