Model Monitoring

Download

Mode fokus

Ukuran font

Terakhir diperbarui: 2026-06-11 17:48:40

Feature Overview
Model Monitoring provides a real-time model inference performance monitoring dashboard, helping you track core metrics such as request concurrency, Token throughput, and response latency for each model, enabling prompt identification of performance bottlenecks.
Category Filters
Category
Monitoring Content
Language Models
Metrics for text generation models, such as RPM (Requests Per Minute), TTFT (Time To First Token), TPOT (Time Per Output Token), and other metrics.
Filter Conditions
Filter Options
Description
Filter Dimension
Supports filtering by service/model and switching to different aggregation views.
Service Selection
Dropdown to select specific inference services (default displays All Services).
Time Range
1 hour/Today/Last 3 days/Last 7 days/Last 30 days, or Custom time range.
Metric Details
Text Generation
Display the following core performance metrics in the form of a time-series line chart:
Metric
Full Name
Unit
Description
Requests Per Minute (RPM)
Requests Per Minute
reqs/min
Number of requests per minute, reflecting request throughput.
Time To First Token (TTFT)
Time To First Token
ms
Response time from when a request is sent to when the first Token is received (average TTFT for streaming requests within a unit time, excluding non-streaming requests).
Time Per Output Token (TPOT)
Time Per Output Token
ms
Average time to generate each output token.
Non-streaming interface latency
Non-Stream Latency
ms
latency for non-streaming requests within a unit time, representing end-to-end duration.
Rate Limited Ratio
Rate Limited Ratio
-
Number of rate-limited requests / total number of calls.
Request Canceled Ratio
Request Canceled Ratio
-
Number of canceled requests triggered by timeout or other reasons / total number of calls.
Error Rate
Request Error Ratio
-
Number of requests with non-200 HTTP status codes / total number of requests.
Cache hit rate
Cache Hit Ratio
-
Number of request cache hit tokens / total number of input tokens
Each metric provides maximum, minimum, average three statistical summaries to help users quickly locate performance abnormalities.
Example:
﻿
﻿
﻿

Bantuan dan Dukungan

Apakah halaman ini membantu?

Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.

masukan

tencent cloud

LLM Service TokenHub

Model Monitoring

Feature Overview

Category Filters

Filter Conditions

Metric Details

Text Generation

Bantuan dan Dukungan

Category	Monitoring Content
Language Models	Metrics for text generation models, such as RPM (Requests Per Minute), TTFT (Time To First Token), TPOT (Time Per Output Token), and other metrics.

Filter Options	Description
Filter Dimension	Supports filtering by service/model and switching to different aggregation views.
Service Selection	Dropdown to select specific inference services (default displays All Services).
Time Range	1 hour/Today/Last 3 days/Last 7 days/Last 30 days, or Custom time range.

Metric	Full Name	Unit	Description
Requests Per Minute (RPM)	Requests Per Minute	reqs/min	Number of requests per minute, reflecting request throughput.
Time To First Token (TTFT)	Time To First Token	ms	Response time from when a request is sent to when the first Token is received (average TTFT for streaming requests within a unit time, excluding non-streaming requests).
Time Per Output Token (TPOT)	Time Per Output Token	ms	Average time to generate each output token.
Non-streaming interface latency	Non-Stream Latency	ms	latency for non-streaming requests within a unit time, representing end-to-end duration.
Rate Limited Ratio	Rate Limited Ratio	-	Number of rate-limited requests / total number of calls.
Request Canceled Ratio	Request Canceled Ratio	-	Number of canceled requests triggered by timeout or other reasons / total number of calls.
Error Rate	Request Error Ratio	-	Number of requests with non-200 HTTP status codes / total number of requests.
Cache hit rate	Cache Hit Ratio	-	Number of request cache hit tokens / total number of input tokens