Module Overview
The online services module of TI-ONE provides the ability to deploy models as online inference services, which can be integrated with the business applications through API calls. Online services support virtualization of heterogeneous compute and elastic scaling, helping address issues, including complex model deployment, resource waste, and low efficiency in manual resource expansion. Meanwhile, online services also support the deployment of multiple model formats, service traffic allocation, and rolling updates to support diverse application requirements in online inference scenarios.
Module Features
Compute Virtualization: The module supports allocating compute as small as 0.1 GPU card for services. Through fine-grained computing power allocation, you can enjoy a cost-effective service experience anytime, anywhere.
Auto Scaling: You can manually or automatically adjust the scaling policy of elastic instances. Model deployment will automatically manage the number of instances dynamically and automatically in real time based on business load conditions, helping you handle business scenarios with the optimal number of instances, and relieving you from the burden of manual deployment.
Abundant Management Capabilities: This model provides rich capabilities, including support for multiple models, multi-version management, traffic distribution, and rolling updates. It also supports multi-dimensional monitoring of services and call information, as well as event viewing, safeguarding your various businesses.
Application Scenarios
You can deploy models for various machine learning scenarios, including recommendation, image process, natural language processing, and automatic speech recognition as online services.