Glossary

다운로드

포커스 모드

폰트 크기

마지막 업데이트 시간: 2026-04-22 10:57:28

Token usage per minute
Tokens Per Minute (TPM), the token usage per minute. It represents the upper limit on the total number of tokens (input + output) that a service can process within one minute. This is a key quota metric that imposes limitations on service throughput.
RPM
Requests Per Minute (RPM), the number of requests per minute. It represents the upper limit on the number of independent requests (API calls) that a service can process within one minute. This is a key quota metric that imposes limitations on service concurrency capacity.
Per-output Token latency
Time Per Output Token (TPOT), the latency per output Token (excluding the first Token). It represents the average time required for the model to generate each subsequent output Token after the first Token is produced. This metric determines the fluency of "streaming output" described below.
First Token Latency
Time To First Token (TTFT), the first token latency. It refers to the time it takes from when a user sends a complete request to when the model returns the first token. This metric directly impacts the perceived "responsiveness" for users.
​​Token​
Token. The basic unit for processing text in large language models. In Chinese, a word, a character, or even a punctuation mark may be divided into one or more Tokens. It is the core unit for measuring model processing volume and computational cost.
﻿

도움말 및 지원

문제 해결에 도움이 되었나요?

더 자세한 내용은 문의하기 또는 티겟 제출 을 통해 문의할 수 있습니다.

피드백

tencent cloud

LLM Service TokenHub

Glossary

Token usage per minute

RPM

Per-output Token latency

First Token Latency

Token

도움말 및 지원

tencent cloud

LLM Service TokenHub

Glossary

Token usage per minute

RPM

Per-output Token latency

First Token Latency

​​Token​

도움말 및 지원

Token