This document explains how to integrate TRTC AI Transcription 2.0 for real-time speech-to-text and translation using server-side configuration.TRTC AI Transcription/Translation 2.0 has optimized the usage configuration and callback methods compared to version 1.0.
Note:
Starting March 17, 2026, new users should follow this guide to access the 2.0 API.
Overview
TRTC AI Transcription and Translation enables real-time conversion of each user's audio stream in a room to text, with support for multilingual translation.
Transcription: Uses an ASR (Automatic Speech Recognition) engine to convert speech to text. The system supports multiple languages, hotword weighting, VAD (Voice Activity Detection), and real-time streaming transcription.
Translation: Optionally enable translation for transcribed text. The system uses an LLM (Large Language Model) translation engine to process transcription results, delivering both the original text and translation via callback.
Note:
Each application supports up to 100 concurrent tasks by default. The total ASR maximum concurrency limit is 200, and the translation maximum concurrency limit is 100.API timeout is 6 seconds. API QPS limit is 20. To increase concurrency or a higher limit, please contact us. Scenario Overview
Single Language Room
In the TRTC room below, multiple hosts are communicating in Chinese. You can start a task to subscribe to all users' Chinese audio streams, select the Chinese ASR model, and transcribe audio to Chinese text in real time. Transcription results are sent via callback. You can also specify a translation target language; the translation engine will translate the transcribed Chinese text into the selected language and return the translation via callback.
Note:
Callback results can be delivered to your backend SDK. For integration details, see the callback interface documentation.
Multi-Language Room
In the TRTC room below, two hosts are communicating in Chinese and French. Since each ASR task only supports one language, you need to initiate two tasks:
1. Task 1: Subscribe to Host 1's Chinese audio, select the Chinese ASR model for real-time transcription, generate Chinese text, and return via callback. Set the translation target language to French; the system will translate the transcribed text to French and return the translation via callback.
2. Task 2: Subscribe to Host 2's French audio, select the French ASR model for real-time transcription, generate French text, and return via callback. Set the translation target language to Chinese; the system will translate the transcribed text to Chinese and return the translation via callback.
With these two tasks, Host 1 and Host 2 can communicate across languages in real time.
Integration Guide
Prerequisites
Purchasing RTC-Engine package (Lite version or above) unlocks the speech to text and real-time translation features. Note:
The speech-to-text and real-time translation features are billed based on usage. For details, see Pricing. Step 1: Integrate TRTC SDK
Import the TRTC SDK into your project, join a TRTC room, and enable local microphone capture and publishing.
Step 2: Start Transcription Task via RESTful API
Use your server to call the REST API (CreateCloudTranscription) to start a transcription task. The TaskId parameter is the unique identifier for the task; save it for subsequent API operations. Follow these steps: Set the required parameters for the transcription and translation task, including Application ID (sdkappid), Room Information (RoomId), and Room Type (RoomIdType).
When using AI transcription and translation, a transcription bot joins the room as a virtual audience and subscribes to the audio streams for recognition. Use TranscriptionParams to specify bot parameters and subscription user parameters. Bot parameters include UserId, User Signature (UserSig), and Max Idle Time. Use the Allowlist (SubscribeUids) to specify hosts for transcription and translation, the Blocklist (UnSubscribeList) to exclude hosts, and the SendCustomMode parameter to define how transcription and translation text is delivered. Use the AsrParam Lang parameter to select the ASR model engine for your scenario. Multiple ASR engines are available for different use cases; see the table below for versions and model parameters. Note:
To use the 16k Chinese-English large model engine, set Lang to 16k_zh_en.
The AsrParam VADSilenceTime parameter configures VAD silence time for speech recognition. ASR engines are available in Trial Edition, Standard Edition, and Advanced Edition, each with different pricing.
|
ASR Engine | Trial Edition Language Engine | Basic speech recognition model. Provides good response speed and accuracy in near-field, low-noise environments. | "zh": 8k Sampling Rate Chinese ASR Model, Primarily Used for Telephony Audio. |
| Standard Edition Language Engine | Large model engine with significantly improved speech recognition, especially in noisy, echo-prone, or distant voice environments. Ideal for meetings, live streaming, voice chat, gaming, real-time captions, and transcription records. Highly suitable for RTC real-time interaction scenarios. | "8k_zh_large":8k Chinese large model engine, optimized for telephone audio. "16k_zh_large":16k large model engine, supports Chinese, English, and various Chinese dialects. "16k_zh_en":Latest 16k Chinese-English large model engine, supports Chinese, English, and multiple Chinese dialects; excels in mixed Chinese-English scenarios. |
| Advanced Edition Language Engine | Accurate recognition for minor languages and dialects. | "vi": Vietnamese "ja": Japanese "ko": Korean "id": Indonesian "th": Thai "pt": Portuguese "tr": Turkish "ar": Arabic "es": Spanish "hi": Hindi "fr": French "ms": Malay "fil": Filipino "de": German "it": Italian "ru": Russian "sv": Swedish "da": Danish "no": Norwegian "zh-yue": Cantonese If you need additional languages, contact us for evaluation. |
4. Enable Translation (Optional)
To enable real-time translation of transcribed text, set translation parameters via TranslationParam. Specify the translation target language. Supported languages are listed below: |
"zh" | Chinese |
"en" | English |
"es" | Spanish |
"pt" | Portuguese |
"fr" | French |
"de" | German |
"ru" | Russian |
"ar" | Arabic |
"ja" | Japanese |
"ko" | Korean |
"vi" | Vietnamese |
"ms" | Malay |
"id" | Indonesian |
"it" | Italian |
"th" | Thai |
Note:
Translation parameters are optional. If you only need speech-to-text, you do not need to set translation parameters; this does not affect AI transcription functionality.
Real-time translation supports 15 languages for both source and target: Chinese, English, Spanish, Portuguese, French, German, Russian, Arabic, Japanese, Korean, Vietnamese, Malay, Indonesian, Italian, Thai. If the ASR transcription language is not among these, translation cannot be enabled. For other languages, please contact technical support. AI translation results are for reference only and should not be used as the sole basis for professional advice or conclusions.
Step 3: Receive Transcription and Translation Callback Results
Method 1: Receive via Server-side callback
The speech-to-text service provides a server-side Event callback for real-time dialogue messages. See AI Transcription & Translation 2.0 Event callback.
Method 2: Receive via Client SDK callback
Use the following code to listen for callback updates from TranscriberStore's reactive data and update the UI.
Add the dependencies to your build.gradle and perform Gradle Sync.
implementation 'io.trtc.uikit:atomicx-core:4.0.0.110'
implementation "com.tencent.liteav:LiteAVSDK_Professional:13.1.0.19861"
implementation "com.tencent.imsdk:imsdk-plus:8.7.7201"
Then use the following code to listen for message list callback updates and update the UI, messages is a list of TranscriberMessage.
AITranscriberStore.shared.transcriberState.realtimeMessageList.collect { messages ->
adapter.submitList(messages.toList())
}
TranscriberMessage Parameter Description:
|
segmentId
| String
| Unique identifier for each sentence from the user. |
speakerUserId
| String
| ID of the speaking user. |
speakerUserName
| String
| Nickname of the speaking user. |
sourceText
| String
| Speech-to-text result for the user. |
translationTexts
| Map
| Translation text for the user's speech, can be translated into multiple languages. |
timestamp
| Long
| Timestamp of the current sentence. |
isCompleted
| Boolean
| Indicates whether the current sentence is completed. |
Add pod 'AtomicXCore' to your Podfile and run pod install.
target 'xxxx' do
pod 'AtomicXCore'
end
Then use the following code to listen for message list callback updates and update the UI. messages is a list of TranscriberMessage.
Swift
AITranscriberStore.shared.state
.subscribe(StatePublisherSelector(keyPath: \\.realtimeMessageList))
.receive(on: RunLoop.main)
.sink { [weak self] in self?.updateMessages($0) }
.store(in: &cancellables)
TranscriberMessage Parameter Description
|
segmentId
| String
| Unique identifier for each sentence from the user. |
speakerUserId
| String
| ID of the speaking user. |
speakerUserName
| String
| Nickname of the speaking user. |
sourceText
| String
| Speech-to-text result for the user. |
translationTexts
| [TranslationLanguage: String]
| Translation text for the user's speech, can be translated into multiple languages. |
timestamp
| Int64
| Timestamp of the current sentence. |
isCompleted
| Bool
| Indicates whether the current sentence is completed. |
Practical Guide
Transcription Task Creation and Error Handling Tutorial
To ensure high availability for AI transcription and translation, follow these best practices when integrating the RESTful API:
After calling CreateCloudTranscription, check the HTTP response. If the request fails, implement a retry strategy based on the returned Status code. Error codes include a "primary" and "secondary" code, e.g., InvalidParameter.SDKAppId. See the table below for details:
|
InvalidParameter.xxxxx | Incorrect input parameters. | Check parameter values based on the specific error message. |
InternalError.xxxxx | Server-side error encountered. | Retry with the same parameters until successful and you receive the Task ID. Recommended exponential backoff: first retry after 3s, second after 6s, third after 12s, etc. |
FailedOperation.RestrictedConcurrency | Concurrent transcription tasks exceeded reserved resources (default is 100). | |
When calling CreateCloudTranscription, the specified UserId/UserSig is the ID for the transcription bot joining the room. Do not duplicate this with other TRTC room users. The room type used by the TRTC client must match the room type specified in the transcription API. For example, if the SDK creates a room with a string room number, the transcription task must also use a string room number.
To query transcription status, you can obtain task information in the following ways:
About 15 seconds after successfully initiating CreateCloudTranscription, call DescribeCloudTranscription to query task information. If the status is Idle, it means the transcription bot did not receive upstream audio; check if there is a host streaming audio in the room.
Transcription task information will be sent to you via callback. See callback event documentation for details.
API Rate Limit Guide
Tencent Cloud API services enforce rate limits for each user to maintain system stability and fair resource allocation. If your request rate exceeds the threshold, the system returns a rate limit error. The default transcription API QPS limit is 20 requests/sec. Contact technical support to request a higher limit. Typically, QPS is set at a 1:20 ratio to maximum concurrent online tasks; for example, with 2000 concurrent transcription tasks, QPS can be raised to 100. Adjust as needed based on your business requirements. If you encounter rate limit errors, take these actions:
Reduce request frequency to stay within the limit.
Implement a request queue in your business logic.
Add appropriate intervals between requests.
For long-term solutions:
Use exponential backoff retry, e.g., first retry after 3s, second after 6s, third after 12s, etc., until successful.
Optimize business logic by having transcription bots join rooms early to reduce concurrent API calls.