Category | Monitoring Content |
Language Models | Metrics for text generation models, such as RPM (Requests Per Minute), TTFT (Time To First Token), TPOT (Time Per Output Token), and other metrics. |
Filter Options | Description |
Filter Dimension | Supports filtering by service/model and switching to different aggregation views. |
Service Selection | Dropdown to select specific inference services (default displays All Services). |
Time Range | 1 hour/Today/Last 3 days/Last 7 days/Last 30 days, or Custom time range. |
Metric | Full Name | Unit | Description |
Requests Per Minute (RPM) | Requests Per Minute | reqs/min | Number of requests per minute, reflecting request throughput. |
Time To First Token (TTFT) | Time To First Token | ms | Response time from when a request is sent to when the first Token is received (average TTFT for streaming requests within a unit time, excluding non-streaming requests). |
Time Per Output Token (TPOT) | Time Per Output Token | ms | Average time to generate each output token. |
Non-streaming interface latency | Non-Stream Latency | ms | latency for non-streaming requests within a unit time, representing end-to-end duration. |
Rate Limited Ratio | Rate Limited Ratio | - | Number of rate-limited requests / total number of calls. |
Request Canceled Ratio | Request Canceled Ratio | - | Number of canceled requests triggered by timeout or other reasons / total number of calls. |
Error Rate | Request Error Ratio | - | Number of requests with non-200 HTTP status codes / total number of requests. |
Cache hit rate | Cache Hit Ratio | - | Number of request cache hit tokens / total number of input tokens |

Apakah halaman ini membantu?
Anda juga dapat Menghubungi Penjualan atau Mengirimkan Tiket untuk meminta bantuan.
masukan