お知らせ・リリースノート
- Tencent Cloudオーディオビデオ端末SDKの再生アップグレードおよび承認チェック追加に関するお知らせ
- TRTCアプリケーションのサブスクリプションパッケージサービスのリリースに関する説明について
製品の説明
購入ガイド
- Billing Overview
- 無料時間の説明
- Monthly subscription
- RTC Engine Top-Up Package
- Pay-as-you-go
- TRTC Overdue and Suspension Policy
- 課金に関するよくあるご質問
- Refund Instructions
初心者ガイド
Call
- コンポーネントの説明（TUICallKit）
- Activate the Service
- Run Demo
- クイック導入
- オフライン通知
- Conversational Chat
- クラウドレコーディング（TUICallKit）
- AI Noise Reduction
- インターフェースのカスタマイズ
- Calls integration to Chat
- Additional Features
- No UI Integration
- Server APIs
- Client APIs
- Solution
- ErrorCode
- 公開ログ
- よくある質問
ライブ配信
- Billing of Video Live Component
- Overview
- Activating the Service (TUILiveKit)
- Demo のクイックスタート
- No UI Integration
- UI Customization
- Live Streaming Management System
- Video Live Streaming
- Voice Chat Room
- Advanced Features
- Client APIs
- Server APIs
- Error Codes
- Release Notes
- FAQs
RTC Engine
- Activate Service
- SDKのダウンロード
- APIコードサンプル
- Usage Guidelines
- クライアント側 API
- 高度な機能
RTC RESTFUL API
- History
- Introduction
- API Category
- Room Management APIs
- Retweet APIs
- On-cloud recording APIs
- Data Monitoring APIs
- Pull stream Relay Related interface
- Page Recording APIs
- AI Service APIs
- Cloud Slicing APIs
- Cloud Moderation APIs
- Companion Transcription APIs
- Making API Requests
- Call Quality Monitoring APIs
- Usage Statistics APIs
- Data Types
- Appendix
- Error Codes
コンソールガイド
- アプリケーション管理
- 使用統計
- 監視ダッシュボード
- 開発支援
Solution
- Real-Time Chorus
よくあるご質問
旧バージョンのドキュメント
Protocols and Policies
TRTC ポリシー
- プライバシーポリシー
- データ処理とセキュリティ契約
用語集

AI Transcription/Translation 2.0

Download

フォーカスモード

フォントサイズ

最終更新日: 2026-05-18 16:29:10

This document explains how to integrate TRTC AI Transcription 2.0 for real-time speech-to-text and translation using server-side configuration.TRTC AI Transcription/Translation 2.0 has optimized the usage configuration and callback methods compared to version 1.0.
Note:
Starting March 17, 2026, new users should follow this guide to access the 2.0 API.
If you have already integrated the legacy AI transcription and translation API (StartAITranscription), refer to Legacy Backend Integration for AI Transcription & Translation for relevant documentation.
Overview
TRTC AI Transcription and Translation enables real-time conversion of each user's audio stream in a room to text, with support for multilingual translation.
Transcription: Uses an ASR (Automatic Speech Recognition) engine to convert speech to text. The system supports multiple languages, hotword weighting, VAD (Voice Activity Detection), and real-time streaming transcription.
Translation: Optionally enable translation for transcribed text. The system uses an LLM (Large Language Model) translation engine to process transcription results, delivering both the original text and translation via callback.
Note:
Each application supports up to 100 concurrent tasks by default. The total ASR maximum concurrency limit is 200, and the translation maximum concurrency limit is 100.API timeout is 6 seconds. API QPS limit is 20. To increase concurrency or a higher limit, please contact us.
Scenario Overview
Single Language Room
In the TRTC room below, multiple hosts are communicating in Chinese. You can start a task to subscribe to all users' Chinese audio streams, select the Chinese ASR model, and transcribe audio to Chinese text in real time. Transcription results are sent via callback. You can also specify a translation target language; the translation engine will translate the transcribed Chinese text into the selected language and return the translation via callback.
Note:
Callback results can be delivered to your backend SDK. For integration details, see the callback interface documentation.
﻿
Multi-Language Room
In the TRTC room below, two hosts are communicating in Chinese and French. Since each ASR task only supports one language, you need to initiate two tasks:
1. Task 1: Subscribe to Host 1's Chinese audio, select the Chinese ASR model for real-time transcription, generate Chinese text, and return via callback. Set the translation target language to French; the system will translate the transcribed text to French and return the translation via callback.
2. Task 2: Subscribe to Host 2's French audio, select the French ASR model for real-time transcription, generate French text, and return via callback. Set the translation target language to Chinese; the system will translate the transcribed text to Chinese and return the translation via callback.
With these two tasks, Host 1 and Host 2 can communicate across languages in real time.
﻿
Integration Guide
Prerequisites
Log in to the TRTC console, activate the TRTC service, and create an RTC-Engine application.
Purchasing RTC-Engine package (Lite version or above) unlocks the speech to text and real-time translation features.
Note:
The speech-to-text and real-time translation features are billed based on usage. For details, see Pricing.﻿
Step 1: Integrate TRTC SDK
Import the TRTC SDK into your project, join a TRTC room, and enable local microphone capture and publishing. 
Step 2: Start Transcription Task via RESTful API
Use your server to call the REST API (CreateCloudTranscription) to start a transcription task. The TaskId parameter is the unique identifier for the task; save it for subsequent API operations. Follow these steps:
1. Configure Basic Parameters
Set the required parameters for the transcription and translation task, including Application ID (sdkappid), Room Information (RoomId), and Room Type (RoomIdType).
2. Configure Room Subscription Parameters
When using AI transcription and translation, a transcription bot joins the room as a virtual audience and subscribes to the audio streams for recognition. Use TranscriptionParams to specify bot parameters and subscription user parameters. Bot parameters include UserId, User Signature (UserSig), and Max Idle Time. Use the Allowlist (SubscribeUids) to specify hosts for transcription and translation, the Blocklist (UnSubscribeList) to exclude hosts, and the SendCustomMode parameter to define how transcription and translation text is delivered.
3. Configure ASR Parameters
Use the AsrParam Lang parameter to select the ASR model engine for your scenario. Multiple ASR engines are available for different use cases; see the table below for versions and model parameters.
Note:
To use the 16k Chinese-English large model engine, set Lang to 16k_zh_en.
The AsrParam VADSilenceTime parameter configures VAD silence time for speech recognition.
ASR engines are available in Trial Edition, Standard Edition, and Advanced Edition, each with different pricing. 
Edition Type
Edition Type
Overview
Language & Model Code
ASR Engine
Trial Edition Language Engine
Basic speech recognition model. Provides good response speed and accuracy in near-field, low-noise environments.
"zh": 8k Sampling Rate Chinese ASR Model, Primarily Used for Telephony Audio.
﻿
Standard Edition Language Engine
Large model engine with significantly improved speech recognition, especially in noisy, echo-prone, or distant voice environments. Ideal for meetings, live streaming, voice chat, gaming, real-time captions, and transcription records. Highly suitable for RTC real-time interaction scenarios.
"8k_zh_large"：8k Chinese large model engine, optimized for telephone audio. 
"16k_zh_large"：16k large model engine, supports Chinese, English, and various Chinese dialects. 
"16k_zh_en"：Latest 16k Chinese-English large model engine, supports Chinese, English, and multiple Chinese dialects; excels in mixed Chinese-English scenarios.
﻿
Advanced Edition Language Engine
Accurate recognition for minor languages and dialects.
"vi": Vietnamese 
"ja": Japanese 
"ko": Korean 
"id": Indonesian 
"th": Thai 
"pt": Portuguese 
"tr": Turkish 
"ar": Arabic 
"es": Spanish 
"hi": Hindi 
"fr": French 
"ms": Malay 
"fil": Filipino 
"de": German 
"it": Italian 
"ru": Russian 
"sv": Swedish 
"da": Danish 
"no": Norwegian 
"zh-yue": Cantonese 
If you need additional languages, contact us for evaluation.
4. Enable Translation (Optional)
To enable real-time translation of transcribed text, set translation parameters via TranslationParam. Specify the translation target language. Supported languages are listed below:
Translation Target Language Code
Language
"zh"
Chinese
"en"
English
"es"
Spanish
"pt"
Portuguese
"fr"
French
"de"
German
"ru"
Russian
"ar"
Arabic
"ja"
Japanese
"ko"
Korean
"vi"
Vietnamese
"ms"
Malay
"id"
Indonesian
"it"
Italian
"th"
Thai
Note:
Translation parameters are optional. If you only need speech-to-text, you do not need to set translation parameters; this does not affect AI transcription functionality.
Real-time translation supports 15 languages for both source and target: Chinese, English, Spanish, Portuguese, French, German, Russian, Arabic, Japanese, Korean, Vietnamese, Malay, Indonesian, Italian, Thai. If the ASR transcription language is not among these, translation cannot be enabled. For other languages, please contact technical support.
AI translation results are for reference only and should not be used as the sole basis for professional advice or conclusions.
Step 3: Receive Transcription and Translation Callback Results
Method 1: Receive via Server-side callback
The speech-to-text service provides a server-side Event callback for real-time dialogue messages. See AI Transcription & Translation 2.0 Event callback.
Method 2: Receive via Client SDK callback
Use the following code to listen for callback updates from TranscriberStore's reactive data and update the UI.
Android
iOS
Add the dependencies to your build.gradle and perform Gradle Sync.
implementation 'io.trtc.uikit:atomicx-core:4.0.0.110' 
implementation "com.tencent.liteav:LiteAVSDK_Professional:13.1.0.19861"
implementation "com.tencent.imsdk:imsdk-plus:8.7.7201"
Then use the following code to listen for message list callback updates and update the UI, messages is a list of TranscriberMessage.
Kotlin
// If displaying the message list with RecyclerView, use submitList to update messages.
AITranscriberStore.shared.transcriberState.realtimeMessageList.collect { messages ->
    adapter.submitList(messages.toList())
}
TranscriberMessage Parameter Description:
Parameter Name
Type
Description
segmentId
String
Unique identifier for each sentence from the user.
speakerUserId
String
ID of the speaking user.
speakerUserName
String
Nickname of the speaking user.
sourceText
String
Speech-to-text result for the user.
translationTexts
Map
Translation text for the user's speech, can be translated into multiple languages.
timestamp
Long
Timestamp of the current sentence.
isCompleted
Boolean
Indicates whether the current sentence is completed.
Add pod 'AtomicXCore' to your Podfile and run pod install.
target 'xxxx' do
  pod 'AtomicXCore'
end
Then use the following code to listen for message list callback updates and update the UI. messages is a list of TranscriberMessage.
Swift
Swift
// If displaying the message list with UITableView, reloadData after updating messages.
AITranscriberStore.shared.state
    .subscribe(StatePublisherSelector(keyPath: \\.realtimeMessageList))
    .receive(on: RunLoop.main)
    .sink { [weak self] in self?.updateMessages($0) }
    .store(in: &cancellables)
TranscriberMessage Parameter Description
Parameter Name
Type
Description
segmentId
String
Unique identifier for each sentence from the user.
speakerUserId
String
ID of the speaking user.
speakerUserName
String
Nickname of the speaking user.
sourceText
String
Speech-to-text result for the user.
translationTexts
[TranslationLanguage: String]
Translation text for the user's speech, can be translated into multiple languages.
timestamp
Int64
Timestamp of the current sentence.
isCompleted
Bool
Indicates whether the current sentence is completed.
Practical Guide
Transcription Task Creation and Error Handling Tutorial
To ensure high availability for AI transcription and translation, follow these best practices when integrating the RESTful API:
After calling CreateCloudTranscription, check the HTTP response. If the request fails, implement a retry strategy based on the returned Status code. Error codes include a "primary" and "secondary" code, e.g., InvalidParameter.SDKAppId. See the table below for details:
Returned Error Code
Issue Description
Resolution
InvalidParameter.xxxxx
Incorrect input parameters.
Check parameter values based on the specific error message.
InternalError.xxxxx
Server-side error encountered.
Retry with the same parameters until successful and you receive the Task ID. Recommended exponential backoff: first retry after 3s, second after 6s, third after 12s, etc.
FailedOperation.RestrictedConcurrency
Concurrent transcription tasks exceeded reserved resources (default is 100).
Contact technical support to request a higher concurrency limit.
When calling CreateCloudTranscription, the specified UserId/UserSig is the ID for the transcription bot joining the room. Do not duplicate this with other TRTC room users. The room type used by the TRTC client must match the room type specified in the transcription API. For example, if the SDK creates a room with a string room number, the transcription task must also use a string room number.
To query transcription status, you can obtain task information in the following ways:
About 15 seconds after successfully initiating CreateCloudTranscription, call DescribeCloudTranscription to query task information. If the status is Idle, it means the transcription bot did not receive upstream audio; check if there is a host streaming audio in the room.
Transcription task information will be sent to you via callback. See callback event documentation for details.
API Rate Limit Guide
Tencent Cloud API services enforce rate limits for each user to maintain system stability and fair resource allocation. If your request rate exceeds the threshold, the system returns a rate limit error. The default transcription API QPS limit is 20 requests/sec. Contact technical support to request a higher limit. Typically, QPS is set at a 1:20 ratio to maximum concurrent online tasks; for example, with 2000 concurrent transcription tasks, QPS can be raised to 100. Adjust as needed based on your business requirements.
If you encounter rate limit errors, take these actions:
Reduce request frequency to stay within the limit.
Implement a request queue in your business logic.
Add appropriate intervals between requests.
 For long-term solutions:
Use exponential backoff retry, e.g., first retry after 3s, second after 6s, third after 12s, etc., until successful.
Optimize business logic by having transcription bots join rooms early to reduce concurrent API calls.

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

Parameter Name	Type	Description
`segmentId`	`String`	Unique identifier for each sentence from the user.
`speakerUserId`	`String`	ID of the speaking user.
`speakerUserName`	`String`	Nickname of the speaking user.
`sourceText`	`String`	Speech-to-text result for the user.
`translationTexts`	`Map`	Translation text for the user's speech, can be translated into multiple languages.
`timestamp`	`Long`	Timestamp of the current sentence.
`isCompleted`	`Boolean`	Indicates whether the current sentence is completed.

Parameter Name	Type	Description
`segmentId`	`String`	Unique identifier for each sentence from the user.
`speakerUserId`	`String`	ID of the speaking user.
`speakerUserName`	`String`	Nickname of the speaking user.
`sourceText`	`String`	Speech-to-text result for the user.
`translationTexts`	`[TranslationLanguage: String]`	Translation text for the user's speech, can be translated into multiple languages.
`timestamp`	`Int64`	Timestamp of the current sentence.
`isCompleted`	`Bool`	Indicates whether the current sentence is completed.

tencent cloud

Tencent Real-Time Communication

AI Transcription/Translation 2.0

Overview

Scenario Overview

Single Language Room

Multi-Language Room

Integration Guide

Prerequisites

Step 1: Integrate TRTC SDK

Step 2: Start Transcription Task via RESTful API

1. Configure Basic Parameters

2. Configure Room Subscription Parameters

3. Configure ASR Parameters

4. Enable Translation (Optional)

Step 3: Receive Transcription and Translation Callback Results

Method 1: Receive via Server-side callback

Method 2: Receive via Client SDK callback

Practical Guide

Transcription Task Creation and Error Handling Tutorial

API Rate Limit Guide

ヘルプとサポート

Edition Type	Edition Type	Overview	Language & Model Code
ASR Engine	Trial Edition Language Engine	Basic speech recognition model. Provides good response speed and accuracy in near-field, low-noise environments.	"zh": 8k Sampling Rate Chinese ASR Model, Primarily Used for Telephony Audio.
		Standard Edition Language Engine	Large model engine with significantly improved speech recognition, especially in noisy, echo-prone, or distant voice environments. Ideal for meetings, live streaming, voice chat, gaming, real-time captions, and transcription records. Highly suitable for RTC real-time interaction scenarios.	"8k_zh_large"：8k Chinese large model engine, optimized for telephone audio. "16k_zh_large"：16k large model engine, supports Chinese, English, and various Chinese dialects. "16k_zh_en"：Latest 16k Chinese-English large model engine, supports Chinese, English, and multiple Chinese dialects; excels in mixed Chinese-English scenarios.
		Advanced Edition Language Engine	Accurate recognition for minor languages and dialects.	"vi": Vietnamese "ja": Japanese "ko": Korean "id": Indonesian "th": Thai "pt": Portuguese "tr": Turkish "ar": Arabic "es": Spanish "hi": Hindi "fr": French "ms": Malay "fil": Filipino "de": German "it": Italian "ru": Russian "sv": Swedish "da": Danish "no": Norwegian "zh-yue": Cantonese If you need additional languages, contact us for evaluation.

Translation Target Language Code	Language
"zh"	Chinese
"en"	English
"es"	Spanish
"pt"	Portuguese
"fr"	French
"de"	German
"ru"	Russian
"ar"	Arabic
"ja"	Japanese
"ko"	Korean
"vi"	Vietnamese
"ms"	Malay
"id"	Indonesian
"it"	Italian
"th"	Thai

Returned Error Code	Issue Description	Resolution
InvalidParameter.xxxxx	Incorrect input parameters.	Check parameter values based on the specific error message.
InternalError.xxxxx	Server-side error encountered.	Retry with the same parameters until successful and you receive the Task ID. Recommended exponential backoff: first retry after 3s, second after 6s, third after 12s, etc.
FailedOperation.RestrictedConcurrency	Concurrent transcription tasks exceeded reserved resources (default is 100).	Contact technical support to request a higher concurrency limit.