tencent cloud

Glossary

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2026-04-22 10:57:28

Token usage per minute

Tokens Per Minute (TPM), the token usage per minute. It represents the upper limit on the total number of tokens (input + output) that a service can process within one minute. This is a key quota metric that imposes limitations on service throughput.

RPM

Requests Per Minute (RPM), the number of requests per minute. It represents the upper limit on the number of independent requests (API calls) that a service can process within one minute. This is a key quota metric that imposes limitations on service concurrency capacity.

Per-output Token latency

Time Per Output Token (TPOT), the latency per output Token (excluding the first Token). It represents the average time required for the model to generate each subsequent output Token after the first Token is produced. This metric determines the fluency of "streaming output" described below.

First Token Latency

Time To First Token (TTFT), the first token latency. It refers to the time it takes from when a user sends a complete request to when the model returns the first token. This metric directly impacts the perceived "responsiveness" for users.

​​Token​

Token. The basic unit for processing text in large language models. In Chinese, a word, a character, or even a punctuation mark may be divided into one or more Tokens. It is the core unit for measuring model processing volume and computational cost.


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백