tencent cloud

Online Inference

Unduh
Mode fokus
Ukuran font
Terakhir diperbarui: 2026-06-11 17:48:40

Feature Overview

The online inference service manages how models are used, such as free quota usage, whether pay-per-token billing is enabled, security policies, and rate limiting policies, and so on.

Service Types

Online inference services are categorized into two types:

1. Default

The platform automatically creates an online inference service for all supported models by default. To get started quickly, users can claim a free trial package on the Model Hub or the model details page, or click Free Trial in the service list on the Online Inference page.

2. Custom

If you need to customize the billing policy for model services or create multiple services to track usage statistics and manage permissions by team, you can create custom inference services in Online Inference.
Custom inference services support a wider range of billing options, such as TPM Reservation. In the future, the platform will further support capabilities like intelligent routing, rate limiting rules, and plug-in enablement/disablement in custom services, helping you achieve more flexible service management and governance.


Service Status

Each online inference service has a status, as described below:
Status
Description
Not Enabled
For the default type of inference service, it is in the Not Enabled state before a user starts using it and changes to the Running state after the user begins the free trial.
Activating
When the service is enabled for the first time, it enters a brief Activating state and is expected to change to the Running state within 5 seconds.
Running
The current service is accessible.
Stopped
1. When the account has overdue payment, pay-as-you-go services will become Stopped; when the account balance is replenished, the service will automatically return to Running.
2. When the free quota is exhausted and postpaid is not enabled, or when a user manually disables postpaid for a service, the service will become Stopped. To resume the service, the user must manually enable postpaid on the Online Inference page.

Billing Mode

The billing method indicates the payment status of the current service, as described below:
Status
Description
Free Trial
The current service is using a free trial package. Usage within the free trial package is not billed.
Pay-as-you-go
The current service has activated the postpaid billing method based on Token usage.
TPM Reservation
The current service has TPM Reservation enabled. Traffic exceeding the TPM limit will be billed by Token.
None
When the user's free trial package is exhausted and postpaid billing is not enabled, there will be no billing status, and the service will become stopped.
Note:
Users can enable postpaid in advance even before the free trial package is fully consumed. After postpaid is enabled, both the Free Trial and Pay-per-Token billing statuses are displayed. The platform prioritizes consuming the free trial quota. Once the free trial package is exhausted, billing based on Token usage will begin.






Bantuan dan Dukungan

Apakah halaman ini membantu?

masukan