tencent cloud

Model Monitoring

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-11 17:48:40

Feature Overview

Model Monitoring provides a real-time model inference performance monitoring dashboard, helping you track core metrics such as request concurrency, Token throughput, and response latency for each model, enabling prompt identification of performance bottlenecks.

Category Filters

Category
Monitoring Content
Language Models
Metrics for text generation models, such as RPM (Requests Per Minute), TTFT (Time To First Token), TPOT (Time Per Output Token), and other metrics.

Filter Conditions

Filter Options
Description
Filter Dimension
Supports filtering by service/model and switching to different aggregation views.
Service Selection
Dropdown to select specific inference services (default displays All Services).
Time Range
1 hour/Today/Last 3 days/Last 7 days/Last 30 days, or Custom time range.

Metric Details

Text Generation

Display the following core performance metrics in the form of a time-series line chart:
Metric
Full Name
Unit
Description
Requests Per Minute (RPM)
Requests Per Minute
reqs/min
Number of requests per minute, reflecting request throughput.
Time To First Token (TTFT)
Time To First Token
ms
Response time from when a request is sent to when the first Token is received (average TTFT for streaming requests within a unit time, excluding non-streaming requests).
Time Per Output Token (TPOT)
Time Per Output Token
ms
Average time to generate each output token.
Non-streaming interface latency
Non-Stream Latency
ms
latency for non-streaming requests within a unit time, representing end-to-end duration.
Rate Limited Ratio
Rate Limited Ratio
-
Number of rate-limited requests / total number of calls.
Request Canceled Ratio
Request Canceled Ratio
-
Number of canceled requests triggered by timeout or other reasons / total number of calls.
Error Rate
Request Error Ratio
-
Number of requests with non-200 HTTP status codes / total number of requests.
Cache hit rate
Cache Hit Ratio
-
Number of request cache hit tokens / total number of input tokens
Each metric provides maximum, minimum, average three statistical summaries to help users quickly locate performance abnormalities.
Example:




Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan