tencent cloud

GLM API Guide

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-18 17:42:19

Overview

The GLM series models from Zhipu AI are now integrated into the TokenHub platform. You can invoke the GLM series models using the OpenAI-compatible protocol to access capabilities such as general conversation, deep reasoning, tool calling, and multimodal understanding.
For general invocation methods (such as BaseURL, API Key, message concatenation, Function Calling, and streaming output), see Language Model API Overview. This document only covers the unique capabilities and usage methods of the GLM series.

Prerequisites

You have registered a Tencent Cloud account and activated the TokenHub service.
You have obtained the API Key in the TokenHub console.
You have installed the corresponding SDK for the programming language you are using, or you possess the capability to make HTTP requests.

Quick Start

The following example demonstrates how to invoke the glm-5.1 model to complete a basic conversation. Replace YOUR_API_KEY with the API Key you created.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5.1",
"messages": [{"role": "user", "content": "hello"}],
"max_tokens": 1024
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1",
)

response = client.chat.completions.create(
model="glm-5.1",
messages=[{"role": "user", "content": "hello"}],
max_tokens=1024,
)
print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
apiKey: 'YOUR_API_KEY',
baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});

const response = await client.chat.completions.create({
model: 'glm-5.1',
messages: [{ role: 'user', content: 'hello' }],
max_tokens: 1024,
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;

public class GlmQuickStart {
public static void main(String[] args) throws Exception {
Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5.1");
body.put("messages", List.of(Map.of("role", "user", "content", "hello")));
body.put("max_tokens", 1024);

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.header("Authorization", "Bearer YOUR_API_KEY")
.post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
.build();

try (Response response = new OkHttpClient().newCall(request).execute()) {
System.out.println(response.body().string());
}
}
}
package main

import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)

func main() {
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5.1",
"messages": []map[string]string{{"role": "user", "content": "hello"}},
"max_tokens": 1024,
})

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

data, _ := io.ReadAll(resp.Body)
fmt.Println(string(data))
}
Note:
GLM series models have reasoning mode enabled by default, and responses contain the reasoning_content field. If you do not need the reasoning process, see the reasoning mode section below to disable it.

Supported Models

model Parameter Value
Positioning
Context / Max Output
Multimodal
Recommended Scenario
glm-5.2
Flagship (latest)
1M tokens / 128K tokens
Not supported.
Agent and coding, long-running tasks, full-stack development, code translation, research reproduction
glm-5.1
Flagship
200K tokens / 128K tokens
Not supported.
General conversation, content creation, knowledge Q&A, complex reasoning
glm-5
Previous-generation flagship
200K tokens / 128K tokens
Not supported.
Scenarios prioritizing stability
glm-5-turbo
Agent-optimized
200K tokens / 128K tokens
Not supported.
Tool calling, long-chain Agent tasks
glm-5v-turbo
Multimodal
200K tokens / 128K tokens
Supports images, videos, and files.
Image understanding, video analysis, document parsing

Thinking Mode

The GLM series integrates conversational and reasoning capabilities. You can control whether to enable the thinking capability using the thinking parameter, without switching the model parameter.

Enabling or Disabling Thinking

Control the thinking behavior using the thinking field. This field is an object format and contains a type attribute:
Field
Type
Value
Default Value
Description
type
String
enabled / disabled
enabled
Controls whether thinking capability is enabled for the current request.
The following example shows how to disable thinking in a request:
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5.1",
"messages": [{"role": "user", "content": "hello"}],
"thinking": {"type": "disabled"}
}'
response = client.chat.completions.create(
model="glm-5.1",
messages=[{"role": "user", "content": "hello"}],
extra_body={"thinking": {"type": "disabled"}},
)
const response = await client.chat.completions.create({
model: 'glm-5.1',
messages: [{ role: 'user', content: 'hello' }],
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'disabled' },
});
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;

public class GlmThinkingDisabled {
public static void main(String[] args) throws Exception {
Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5.1");
body.put("messages", List.of(Map.of("role", "user", "content", "hello")));
body.put("thinking", Map.of("type", "disabled"));

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.header("Authorization", "Bearer YOUR_API_KEY")
.post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
.build();

try (Response response = new OkHttpClient().newCall(request).execute()) {
System.out.println(response.body().string());
}
}
}
package main

import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)

func main() {
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5.1",
"messages": []map[string]string{{"role": "user", "content": "hello"}},
"thinking": map[string]string{"type": "disabled"},
})

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

data, _ := io.ReadAll(resp.Body)
fmt.Println(string(data))
}
Note:
thinking is not a standard OpenAI field. When using the OpenAI SDK, you must pass it through the SDK's mechanism for extra fields (use extra_body for Python, or pass the field directly for Node.js). For direct HTTP calls, place it at the top level of the request body.

Obtaining Thinking Content

After thinking is enabled, the response includes a new reasoning_content field, which is at the same level as content:
{
"choices": [{
"message": {
"role": "assistant",
"reasoning_content": "Let me analyze this problem...",
"content": "The final answer is..."
}
}]
}
Because reasoning_content is not a standard OpenAI field, you must access it via null-checking or reflection when using language-specific SDKs. For direct HTTP calls, simply read the response JSON.
Python
Node.js
Java
Go
message = response.choices[0].message
if hasattr(message, "reasoning_content") and message.reasoning_content:
print("Thinking process:", message.reasoning_content)
print("Answer:", message.content)
const message = response.choices[0].message;
// @ts-ignore - reasoning_content is a GLM extension field
if (message.reasoning_content) {
console.log('Thinking process:', message.reasoning_content);
}
console.log('Answer:', message.content);
// After obtaining the response string from an HTTP call, parse the reasoning_content and content fields using Gson.
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;

String respBody = response.body().string();
JsonObject json = JsonParser.parseString(respBody).getAsJsonObject();
JsonObject message = json.getAsJsonArray("choices")
.get(0).getAsJsonObject()
.getAsJsonObject("message");

if (message.has("reasoning_content") && !message.get("reasoning_content").isJsonNull()) {
System.out.println("Thinking process: " + message.get("reasoning_content").getAsString());
}
System.out.println("Answer: " + message.get("content").getAsString());
type Message struct {
Role string `json:"role"`
Content string `json:"content"`
ReasoningContent string `json:"reasoning_content,omitempty"`
}

type Choice struct {
Index int `json:"index"`
Message Message `json:"message"`
}

type ChatResponse struct {
Choices []Choice `json:"choices"`
}

var result ChatResponse
json.Unmarshal(data, &result)

if result.Choices[0].Message.ReasoningContent != "" {
fmt.Println("Thinking process:", result.Choices[0].Message.ReasoningContent)
}
fmt.Println("Answer:", result.Choices[0].Message.Content)

Multi-Turn Conversation Processing

When constructing messages for subsequent rounds, do not write back the reasoning_content. Only pass the content field as the assistant message.

Streaming Invocation

When the thinking mode is enabled, it is recommended to use streaming calls (stream=True). The reasoning content can be lengthy, and non-streaming calls are prone to triggering gateway timeouts.
In streaming mode, the reasoning_content is fully output before the content. Client processing logic: accumulate the delta.reasoning_content to output the thinking process, then accumulate the delta.content to output the final answer.
cURL
Python
Node.js
Java
Go
curl -N -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5.1",
"messages": [{"role": "user", "content": "Explain quantum entanglement"}],
"stream": true,
"stream_options": {"include_usage": true},
"thinking": {"type": "enabled"}
}'
stream = client.chat.completions.create(
model="glm-5.1",
messages=[{"role": "user", "content": "Explain quantum entanglement"}],
stream=True,
stream_options={"include_usage": True},
extra_body={"thinking": {"type": "enabled"}},
)

is_answering = False
for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(delta.reasoning_content, end="", flush=True)
if hasattr(delta, "content") and delta.content:
if not is_answering:
print("\\n--- Answer ---\\n")
is_answering = True
print(delta.content, end="", flush=True)
const stream = await client.chat.completions.create({
model: 'glm-5.1',
messages: [{ role: 'user', content: 'Explain quantum entanglement' }],
stream: true,
stream_options: { include_usage: true },
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'enabled' },
});

let isAnswering = false;
for await (const chunk of stream) {
if (!chunk.choices?.length) continue;
const delta = chunk.choices[0].delta;
// @ts-ignore - reasoning_content is a GLM extension field
if (delta.reasoning_content) {
process.stdout.write(delta.reasoning_content);
}
if (delta.content) {
if (!isAnswering) {
process.stdout.write('\\n--- Answer ---\\n');
isAnswering = true;
}
process.stdout.write(delta.content);
}
}
import okhttp3.*;
import com.google.gson.*;
import java.util.*;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public class GlmStream {
public static void main(String[] args) throws Exception {
Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5.1");
body.put("messages", List.of(Map.of("role", "user", "content", "Explain quantum entanglement")));
body.put("stream", true);
body.put("stream_options", Map.of("include_usage", true));
body.put("thinking", Map.of("type", "enabled"));

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.header("Authorization", "Bearer YOUR_API_KEY")
.post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
.build();

try (Response response = new OkHttpClient().newCall(request).execute();
BufferedReader reader = new BufferedReader(
new InputStreamReader(response.body().byteStream()))) {
String line;
boolean isAnswering = false;
while ((line = reader.readLine()) != null) {
if (!line.startsWith("data: ")) continue;
String data = line.substring(6);
if (data.equals("[DONE]")) break;

JsonObject chunk = JsonParser.parseString(data).getAsJsonObject();
JsonArray choices = chunk.getAsJsonArray("choices");
if (choices == null || choices.size() == 0) continue;
JsonObject delta = choices.get(0).getAsJsonObject().getAsJsonObject("delta");

if (delta.has("reasoning_content") && !delta.get("reasoning_content").isJsonNull()) {
System.out.print(delta.get("reasoning_content").getAsString());
}
if (delta.has("content") && !delta.get("content").isJsonNull()) {
if (!isAnswering) {
System.out.println("\\n--- Answer ---");
isAnswering = true;
}
System.out.print(delta.get("content").getAsString());
}
}
}
}
}
package main

import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
"strings"
)

type StreamDelta struct {
Content string `json:"content,omitempty"`
ReasoningContent string `json:"reasoning_content,omitempty"`
}

type StreamChoice struct {
Delta StreamDelta `json:"delta"`
}

type StreamChunk struct {
Choices []StreamChoice `json:"choices"`
}

func main() {
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5.1",
"messages": []map[string]string{{"role": "user", "content": "Explain quantum entanglement"}},
"stream": true,
"stream_options": map[string]bool{"include_usage": true},
"thinking": map[string]string{"type": "enabled"},
})

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

isAnswering := false
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if !strings.HasPrefix(line, "data: ") {
continue
}
data := strings.TrimPrefix(line, "data: ")
if data == "[DONE]" {
break
}
var chunk StreamChunk
if err := json.Unmarshal([]byte(data), &chunk); err != nil {
continue
}
if len(chunk.Choices) == 0 {
continue
}
delta := chunk.Choices[0].Delta
if delta.ReasoningContent != "" {
fmt.Print(delta.ReasoningContent)
}
if delta.Content != "" {
if !isAnswering {
fmt.Println("\\n--- Answer ---")
isAnswering = true
}
fmt.Print(delta.Content)
}
}
}

Reasoning Effort (reasoning_effort)

glm-5.2 additionally provides the reasoning_effort parameter beyond the thinking field to control reasoning depth, enabling flexible trade-offs between effectiveness and latency. This parameter is only supported by glm-5.2. For other GLM models, continue using the thinking field (see Enable or Disable Thinking above). When this parameter is not passed, the default value is max.
Value
Description
high
Enhances reasoning. Suitable for scenarios such as general reasoning and code generation.
max
Deep reasoning (default value). Suitable for scenarios such as complex reasoning, long-running tasks, and deep code analysis.
reasoning_effort is not a standard OpenAI field. When using the OpenAI SDK, you must pass it through the SDK's mechanism for extra fields (use extra_body for Python, or pass the field directly for Node.js). For direct HTTP calls, place it at the top level of the request body. An example is provided below:
cURL
Python
Node.js
curl --location -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
--header "Authorization: Bearer ${API_KEY}" \\
--header 'Content-Type: application/json' \\
--data-raw '{
"model": "glm-5.2",
"messages": [
{"role": "user", "content": "Analyze the performance bottlenecks in the following code and provide optimization suggestions."}
],
"thinking": {"type": "enabled"},
"reasoning_effort": "max",
"temperature": 1.0,
"max_tokens": 65536
}'
from openai import OpenAI

client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokenhub-intl.tencentcloudmaas.com/v1"
)

response = client.chat.completions.create(
model="glm-5.2",
messages=[
{"role": "user", "content": "Prove that the square root of 2 is an irrational number."}
],
extra_body={
"thinking": {"type": "enabled"},
"reasoning_effort": "max",
},
temperature=1.0,
max_tokens=65536,
)

print(response.choices[0].message.content)
import OpenAI from 'openai';

const client = new OpenAI({
apiKey: 'YOUR_API_KEY',
baseURL: 'https://tokenhub-intl.tencentcloudmaas.com/v1',
});

const completion = await client.chat.completions.create({
model: 'glm-5.2',
messages: [
{ role: 'user', content: 'Analyze the complexity of this code and provide optimization suggestions.' },
],
// @ts-ignore - thinking / reasoning_effort are GLM extension fields
thinking: { type: 'enabled' },
// @ts-ignore
reasoning_effort: 'max',
temperature: 1.0,
max_tokens: 65536,
});

console.log(completion.choices[0].message.content);
Note:
thinking and reasoning_effort are used together: thinking controls whether to enable the reasoning capability (enable / disable), while reasoning_effort further controls the reasoning depth on glm-5.2 (high / max). For deep reasoning scenarios, it is recommended to set both thinking: {"type": "enabled"} and reasoning_effort: "max".

Tool Call

Function Calling for the GLM series follows the OpenAI standard protocol (tools / tool_choice). For general usage, see Language Model Invocation Overview.

Parameter Concatenation in Streaming Scenarios

During streaming invocation of the GLM model, tool_call.arguments are returned incrementally in multiple chunks. The client needs to accumulate and concatenate them based on tool_call.index:
Python
Node.js
Java
Go
completion = client.chat.completions.create(
model="glm-5.1",
messages=[{"role": "user", "content": "Query Shenzhen weather"}],
tools=tools,
stream=True,
extra_body={"thinking": {"type": "disabled"}},
)

arg_buffer = ""
tool_name = ""
for chunk in completion:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if hasattr(delta, "tool_calls") and delta.tool_calls:
for tc in delta.tool_calls:
if tc.function and tc.function.name:
tool_name = tc.function.name
if tc.function and tc.function.arguments:
arg_buffer += tc.function.arguments

print(f"Calling tool: {tool_name}")
print(f"Complete parameters: {arg_buffer}") # Output: {"city": "Shenzhen"}
const completion = await client.chat.completions.create({
model: 'glm-5.1',
messages: [{ role: 'user', content: 'Query Shenzhen weather' }],
tools: tools,
stream: true,
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'disabled' },
});

let argBuffer = '';
let toolName = '';
for await (const chunk of completion) {
if (!chunk.choices?.length) continue;
const delta = chunk.choices[0].delta;
if (delta.tool_calls) {
for (const tc of delta.tool_calls) {
if (tc.function?.name) toolName = tc.function.name;
if (tc.function?.arguments) argBuffer += tc.function.arguments;
}
}
}

console.log(`Calling tool: ${toolName}`);
console.log(`Complete parameters: ${argBuffer}`); // Output: {"city": "Shenzhen"}
import okhttp3.*;
import com.google.gson.*;
import java.util.*;
import java.io.BufferedReader;
import java.io.InputStreamReader;

// For the definition of tools, see the Function Calling section in the Invocation Overview.
Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5.1");
body.put("messages", List.of(Map.of("role", "user", "content", "Query Shenzhen weather")));
body.put("tools", tools);
body.put("stream", true);
body.put("thinking", Map.of("type", "disabled"));

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.header("Authorization", "Bearer YOUR_API_KEY")
.post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
.build();

StringBuilder argBuffer = new StringBuilder();
String toolName = "";

try (Response response = new OkHttpClient().newCall(request).execute();
BufferedReader reader = new BufferedReader(
new InputStreamReader(response.body().byteStream()))) {
String line;
while ((line = reader.readLine()) != null) {
if (!line.startsWith("data: ")) continue;
String data = line.substring(6);
if (data.equals("[DONE]")) break;

JsonObject chunk = JsonParser.parseString(data).getAsJsonObject();
JsonArray choices = chunk.getAsJsonArray("choices");
if (choices == null || choices.size() == 0) continue;
JsonObject delta = choices.get(0).getAsJsonObject().getAsJsonObject("delta");

if (delta.has("tool_calls") && delta.get("tool_calls").isJsonArray()) {
for (JsonElement tcEl : delta.getAsJsonArray("tool_calls")) {
JsonObject fn = tcEl.getAsJsonObject().getAsJsonObject("function");
if (fn != null) {
if (fn.has("name") && !fn.get("name").isJsonNull()) {
toolName = fn.get("name").getAsString();
}
if (fn.has("arguments") && !fn.get("arguments").isJsonNull()) {
argBuffer.append(fn.get("arguments").getAsString());
}
}
}
}
}
}
System.out.println("Calling tool: " + toolName);
System.out.println("Complete parameters: " + argBuffer);
package main

import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
"strings"
)

type ToolCallFunction struct {
Name string `json:"name,omitempty"`
Arguments string `json:"arguments,omitempty"`
}

type ToolCall struct {
Index int `json:"index"`
Function ToolCallFunction `json:"function"`
}

type ToolStreamDelta struct {
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
}

type ToolStreamChoice struct {
Delta ToolStreamDelta `json:"delta"`
}

type ToolStreamChunk struct {
Choices []ToolStreamChoice `json:"choices"`
}

func main() {
// For the definition of tools, see the Function Calling section in the Invocation Overview.
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5.1",
"messages": []map[string]string{{"role": "user", "content": "Query Shenzhen weather"}},
"tools": tools,
"stream": true,
"thinking": map[string]string{"type": "disabled"},
})

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

var argBuffer strings.Builder
var toolName string
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
line := scanner.Text()
if !strings.HasPrefix(line, "data: ") {
continue
}
data := strings.TrimPrefix(line, "data: ")
if data == "[DONE]" {
break
}
var chunk ToolStreamChunk
if err := json.Unmarshal([]byte(data), &chunk); err != nil {
continue
}
if len(chunk.Choices) == 0 {
continue
}
for _, tc := range chunk.Choices[0].Delta.ToolCalls {
if tc.Function.Name != "" {
toolName = tc.Function.Name
}
if tc.Function.Arguments != "" {
argBuffer.WriteString(tc.Function.Arguments)
}
}
}
fmt.Printf("Calling tool: %s\\nComplete parameters: %s\\n", toolName, argBuffer.String())
}

tool_stream Parameter

The GLM series supports the tool_stream parameter (Boolean type) to control the streaming chunk granularity of tool invocation parameters:
extra_body={"tool_stream": True, "thinking": {"type": "disabled"}}

Multimodal Invocation

glm-5v-turbo is the only model in the GLM series that supports multimodal input, accepting images, videos, and files as input and outputting text.

Use Limits

Images, videos, and files cannot be mixed in a single request.
File input supports only URLs and does not support Base64.
Image input supports both URL and Base64 formats.

Image Input

Verified supported formats: PNG, JPG, JPEG, WebP. If you need to use other formats, first perform a small-sample test for confirmation.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5v-turbo",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Please describe this picture"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.png"}}
]
}],
"max_tokens": 1024,
"thinking": {"type": "disabled"}
}'
response = client.chat.completions.create(
model="glm-5v-turbo",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Please describe this picture"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.png"}},
],
}],
max_tokens=1024,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
const response = await client.chat.completions.create({
model: 'glm-5v-turbo',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Please describe this picture' },
{ type: 'image_url', image_url: { url: 'https://example.com/photo.png' } },
],
}],
max_tokens: 1024,
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'disabled' },
});
console.log(response.choices[0].message.content);
import okhttp3.*;
import com.google.gson.Gson;
import java.util.*;

public class GlmImageInput {
public static void main(String[] args) throws Exception {
List<Map<String, Object>> content = List.of(
Map.of("type", "text", "text", "Please describe this picture"),
Map.of("type", "image_url", "image_url",
Map.of("url", "https://example.com/photo.png"))
);

Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5v-turbo");
body.put("messages", List.of(Map.of("role", "user", "content", content)));
body.put("max_tokens", 1024);
body.put("thinking", Map.of("type", "disabled"));

Request request = new Request.Builder()
.url("https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions")
.header("Authorization", "Bearer YOUR_API_KEY")
.post(RequestBody.create(new Gson().toJson(body), MediaType.parse("application/json")))
.build();

try (Response response = new OkHttpClient().newCall(request).execute()) {
System.out.println(response.body().string());
}
}
}
package main

import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)

func main() {
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5v-turbo",
"messages": []map[string]interface{}{
{
"role": "user",
"content": []map[string]interface{}{
{"type": "text", "text": "Please describe this picture"},
{"type": "image_url", "image_url": map[string]string{
"url": "https://example.com/photo.png",
}},
},
},
},
"max_tokens": 1024,
"thinking": map[string]string{"type": "disabled"},
})

req, _ := http.NewRequest("POST",
"https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions",
bytes.NewBuffer(body))
req.Header.Set("Authorization", "Bearer YOUR_API_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

data, _ := io.ReadAll(resp.Body)
fmt.Println(string(data))
}
Base64 method (Python example):
After a local image is read as Base64, pass it in Data URI format to image_url.url:
import base64

with open("local.jpg", "rb") as f:
b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
model="glm-5v-turbo",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in the picture?"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}},
],
}],
max_tokens=1024,
extra_body={"thinking": {"type": "disabled"}},
)

Video Input

Verified supported formats: MP4, MPEG, MOV, AVI, WebM, WMV, 3GPP. If you need to use other formats, first perform a small-sample test for confirmation.
To pass a video file URL, replace the image_url field in the image example above with video_url:
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5v-turbo",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the content of this video"},
{"type": "video_url", "video_url": {"url": "https://example.com/demo.mp4"}}
]
}],
"max_tokens": 2048,
"thinking": {"type": "disabled"}
}'
response = client.chat.completions.create(
model="glm-5v-turbo",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the content of this video"},
{"type": "video_url", "video_url": {"url": "https://example.com/demo.mp4"}},
],
}],
max_tokens=2048,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
const response = await client.chat.completions.create({
model: 'glm-5v-turbo',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Please summarize the content of this video' },
{ type: 'video_url', video_url: { url: 'https://example.com/demo.mp4' } },
],
}],
max_tokens: 2048,
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'disabled' },
});
console.log(response.choices[0].message.content);
List<Map<String, Object>> content = List.of(
Map.of("type", "text", "text", "Please summarize the content of this video"),
Map.of("type", "video_url", "video_url",
Map.of("url", "https://example.com/demo.mp4"))
);

Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5v-turbo");
body.put("messages", List.of(Map.of("role", "user", "content", content)));
body.put("max_tokens", 2048);
body.put("thinking", Map.of("type", "disabled"));
// The rest of the HTTP request logic is the same as in the image example.
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5v-turbo",
"messages": []map[string]interface{}{
{
"role": "user",
"content": []map[string]interface{}{
{"type": "text", "text": "Please summarize the content of this video"},
{"type": "video_url", "video_url": map[string]string{
"url": "https://example.com/demo.mp4",
}},
},
},
},
"max_tokens": 2048,
"thinking": map[string]string{"type": "disabled"},
})
// The rest of the HTTP request logic is the same as in the image example.

File Input

Verified supported formats: PDF, TXT, DOC. If you need to use other formats, first perform a small-sample test for confirmation.
File input is supported only via URL and does not support Base64 encoding. If you have a local file that needs to be parsed, first upload it to an object storage service (such as Tencent Cloud COS), and then use the generated URL.
cURL
Python
Node.js
Java
Go
curl -X POST 'https://tokenhub-intl.tencentcloudmaas.com/v1/chat/completions' \\
-H 'Authorization: Bearer YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{
"model": "glm-5v-turbo",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Please extract the key points of this document"},
{"type": "file_url", "file_url": {"url": "https://example.com/report.pdf"}}
]
}],
"max_tokens": 4096,
"thinking": {"type": "disabled"}
}'
response = client.chat.completions.create(
model="glm-5v-turbo",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Please extract the key points of this document"},
{"type": "file_url", "file_url": {"url": "https://example.com/report.pdf"}},
],
}],
max_tokens=4096,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
const response = await client.chat.completions.create({
model: 'glm-5v-turbo',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Please extract the key points of this document' },
{ type: 'file_url', file_url: { url: 'https://example.com/report.pdf' } },
],
}],
max_tokens: 4096,
// @ts-ignore - thinking is a GLM extension field
thinking: { type: 'disabled' },
});
console.log(response.choices[0].message.content);
List<Map<String, Object>> content = List.of(
Map.of("type", "text", "text", "Please extract the key points of this document"),
Map.of("type", "file_url", "file_url",
Map.of("url", "https://example.com/report.pdf"))
);

Map<String, Object> body = new HashMap<>();
body.put("model", "glm-5v-turbo");
body.put("messages", List.of(Map.of("role", "user", "content", content)));
body.put("max_tokens", 4096);
body.put("thinking", Map.of("type", "disabled"));
// The rest of the HTTP request logic is the same as in the image example.
body, _ := json.Marshal(map[string]interface{}{
"model": "glm-5v-turbo",
"messages": []map[string]interface{}{
{
"role": "user",
"content": []map[string]interface{}{
{"type": "text", "text": "Please extract the key points of this document"},
{"type": "file_url", "file_url": map[string]string{
"url": "https://example.com/report.pdf",
}},
},
},
},
"max_tokens": 4096,
"thinking": map[string]string{"type": "disabled"},
})
// The rest of the HTTP request logic is the same as in the image example.

Use Limits

Restriction Item
Description
Thinking mode enabled by default.
Enabled by default when the thinking parameter is not passed, and the response will contain the reasoning_content field. Explicitly disable it when not needed.
Timeout risk for non-streaming calls
The output is longer in thinking mode. Use stream=True.
Multimodal support limited to glm-5v-turbo.
The other three models do not support image, video, or file input.
Multimodal inputs cannot be mixed.
Only one type of image, video, or file can be uploaded in a single request.
File input supports URL only.
file_url does not support Base64 or Data URI.
Tool parameters returned incrementally
During streaming calls, tool_call.arguments are returned in multiple chunks and need to be concatenated by the client.
Request body size limit
The body of a single request must not exceed 100 MB.

References

Language Model API Overview: This TokenHub language model general invocation document contains general descriptions of BaseURL, API Key, multi-turn conversations, Function Calling, the Anthropic protocol, and more.
TokenHub Console: The entry point for creating and managing API Keys.


Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan