Tencent Cloud

Recent Pages

Speech Recognition

Last updated:
Feature Overview
This API is used to submit a speech recognition job.
Request
Sample request
POST /jobs HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml
﻿
<body>
Note: 
Authorization: Auth String. For more information, see Request Signature.
When this feature is used by a sub-account, relevant permissions must be granted as instructed in Authorization Granularity Details.
Fast speech recognition only supports audio files of up to 100 MB and up to 2 hours in length.
General speech recognition supports audio files of up to 512 MB and up to 5 hours in length.
Request headers
This API only uses Common Request Headers.
Request body
This request requires the following request body:
<Request>
    <Tag>SpeechRecognition</Tag>
    <Input>
        <Object>input/test.mp3</Object>
    </Input>
    <Operation>
        <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
        <Output>
            <Region>ap-chongqing</Region>
            <Bucket>test-123456789</Bucket>
            <Object>output/asr.txt</Object>
        </Output>
        <UserData>This is my data.</UserData>
        <JobLevel>0</JobLevel>
    </Operation>
    <CallBack>http://callback.demo.com</CallBack>
    <CallBackFormat>JSON<CallBackFormat>
</Request>
The nodes are described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required
Request
None
Request container
Container
Yes
Request has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Tag
Request
Job type: SpeechRecognition
String
Yes
Input
Request
Information about the object to be operated
Container
Yes
Operation
Request
Operation rule
Container
Yes
CallBackFormat
Request
Job callback format, which can be JSON or XML (default value). It takes priority over that of the queue.
String
No
CallBackType
Request
Job callback type, which can be Url (default value) or TDMQ. It takes priority over that of the queue.
String
No
CallBack
Request
Job callback address, which takes priority over that of the queue. If it is set to no, no callbacks will be generated at the callback address of the queue.
String
No
CallBackMqConfig
Request
TDMQ configuration for job callback as described in Structure, which is required if CallBackType is TDMQ.
Container
No
Input has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Object
Request.Input
File path
String
No
Url
Request.Input
URL that supports downloading over the public network. At least one of Url and Object must be passed in. If both parameters are passed in, Object is preferred.
String
No
Operation has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
TemplateId
Request.Operation
Template ID
String
No
SpeechRecognition
Request.Operation
Speech recognition parameter, which is the same as Request.SpeechRecognition in the speech recognition template creation API
Container
No
Output
Request.Operation
Result output configuration
Container
Yes
UserData
Request.Operation
The user information passed through, which is printable ASCII codes of up to 1,024 in length.
String
No
JobLevel
Request.Operation
Job priority. The greater the value, the higher the priority. Valid values: 0 (default), 1, 2.
String
No
Note: 
TemplateId is used first. If TemplateId is unavailable, SpeechRecognition is used.
Output has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Region
Request.Operation.Output
Bucket region
String
Yes
Bucket
Request.Operation.Output
Result storage bucket
String
Yes
Object
Request.Operation.Output
Result file name
String
Yes
Response
Response headers
This API only returns Common Response Headers.
Response body
The response body returns application/xml data. The following contains all the nodes:
<Response>
    <JobsDetail>
        <Code>Success</Code>
        <CreationTime>2021-08-05T15:43:50+0800</CreationTime>
        <EndTime>-</EndTime>
        <Input>
            <BucketId>test-1234567890</BucketId>
            <Object>input/test.mp3</Object>
            <Region>ap-chongqing</Region>
        </Input>
        <JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
        <Message/>
        <Operation>
            <Output>
                <Bucket>test-1234567890</Bucket>
                <Object>output/asr.txt</Object>
                <Region>ap-chongqing</Region>
            </Output>
            <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
            <TemplateName>speech_demo</TemplateName>
            <UserData>This is my data.</UserData>
            <JobLevel>0</JobLevel>
        </Operation>
        <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
        <StartTime>-</StartTime>
        <State>Submitted</State>
        <Tag>SpeechRecognition</Tag>
    </JobsDetail>
</Response>
The nodes are as described below:
Node Name (Keyword)
Parent Node
Description
Type
Response
None
Result storage container
Container
Response has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
JobsDetail
Response
Job details
Container
﻿JobsDetail
 has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Code
Response.JobsDetail
Error code, which is returned only if State is Failed
String
Message
Response.JobsDetail
Error message, which is returned only if State is Failed
String
JobId
Response.JobsDetail
Job ID
String
Tag
Response.JobsDetail
Job type: SpeechRecognition
String
State
Response.JobsDetail
Job status. Valid values: Submitted, Running, Success, Failed, Pause, Cancel.
String
CreationTime
Response.JobsDetail
Job creation time
String
StartTime
Response.JobsDetail
Job start time
String
EndTime
Response.JobsDetail
Job end time
String
QueueId
Response.JobsDetail
ID of the queue where the job is in
String
Input
Response.JobsDetail
Same as the Request.Input node in the request
Container
Operation
Response.JobsDetail
Task rule
Container
Operation has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.JobsDetail.Operation
Job template ID
String
TemplateName
Response.JobsDetail.Operation
Job template name, which is returned when TemplateId exists
String
SpeechRecognition
Response.JobsDetail.Operation
Same as Request.Operation.SpeechRecognition in the request.
Container
Output
Response.JobsDetail.Operation
Same as Request.Operation.Output in the request.
Container
UserData
Response.JobsDetail.Operation
The user information passed through.
String
JobLevel
Response.JobsDetail.Operation
Job priority
String
SpeechRecognitionResult
Response.JobsDetail.Operation
Speech recognition job result. It is not returned when there is no result.
Container
SpeechRecognitionResult has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
AudioTime
Response.JobsDetail.Operation.SpeechRecognitionResult
Audio duration (seconds)
String
Result
Response.JobsDetail.Operation.SpeechRecognitionResult
Speech recognition result
String
FlashResult
Response.JobsDetail.Operation.SpeechRecognitionResult
Result of fast speech recognition
Container array
ResultDetail
Response.JobsDetail.Operation.SpeechRecognitionResult
Recognition result details, including the time offsets of words in each sentence (generally used to generate subtitles). This field will not be empty if ResTextFormat in the speech recognition request is 1.
Note: This field may be empty, indicating that no valid values can be obtained.
Container array
FlashResult has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
channel_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Sound channel ID, which starts from 0 and is corresponding to the number of audio channels
Integer
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Complete audio recognition results of the sound channel
String
sentence_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
List of recognition results at sentence/paragraph level
Container array
sentence_list has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Sentence/paragraph-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Start time
Integer
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
End time
Integer
speaker_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Speaker ID (If speaker_diarization is set in the request, speakers can be distinguished by speaker_id.)
Integer
word_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
List of recognition results at word level
Container array
word_list has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
word
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
Word-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
Start time
Integer
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
End time
Integer
ResultDetail has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
FinalSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Final recognition result of a single sentence
String
SliceSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Middle recognition result of a single sentence, which is split into multiple words using spaces
String
StartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Start time of a single sentence (millisecond)
String
EndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
End time of a single sentence (millisecond)
String
WordsNum
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Number of words in a single sentence
String
SpeechSpeed
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Speed of a single sentence. Unit: words/second
String
SpeakerId
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Sound channel or speaker ID (If speaker_diarization is set or ChannelNum is set to Dual in the request, the speakers or channels can be distinguished.)
String
Words
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Word details in a single sentence
Container array
Words has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Word
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
Word text
String
OffsetStartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
Start time offset in a sentence
String
OffsetEndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
End time offset in a sentence
String
Error codes
No special error message will be returned for this request. For the common error messages, see Error Codes.
Examples
Request 1: Using a speech recognition template ID
POST /jobs HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0ea057
Host: test-1234567890.ci.ap-beijing.myqcloud.com
Content-Length: 166
Content-Type: application/xml
﻿
<Request>
    <Tag>SpeechRecognition</Tag>
    <Input>
        <Object>input/test.mp3</Object>
    </Input>
    <Operation>
        <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
        <Output>
            <Region>ap-chongqing</Region>
            <Bucket>test-123456789</Bucket>
            <Object>output/asr.txt</Object>
        </Output>
        <UserData>This is my data.</UserData>
        <JobLevel>0</JobLevel>
    </Operation>
    <CallBack>http://callback.demo.com</CallBack>
    <CallBackFormat>JSON<CallBackFormat>
</Request>
Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 230
Connection: keep-alive
Date: Mon, 28 Jun 2022 15:23:12 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****
﻿
<Response>
    <JobsDetail>
        <Code>Success</Code>
        <CreationTime>2021-08-05T15:43:50+0800</CreationTime>
        <EndTime>-</EndTime>
        <Input>
            <BucketId>test-1234567890</BucketId>
            <Object>input/test.mp3</Object>
            <Region>ap-chongqing</Region>
        </Input>
        <JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
        <Message/>
        <Operation>
            <JobLevel>0</JobLevel>
            <Output>
                <Bucket>test-1234567890</Bucket>
                <Object>output/asr.txt</Object>
                <Region>ap-chongqing</Region>
            </Output>
            <TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
            <TemplateName>speech_demo</TemplateName>
            <UserData>This is my data.</UserData>
            <JobLevel>0</JobLevel>
        </Operation>
        <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
        <StartTime>-</StartTime>
        <State>Submitted</State>
        <Tag>SpeechRecognition</Tag>
    </JobsDetail>
</Response>
Request 2: Using the speech recognition parameter
POST /jobs HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0ea057
Host: test-1234567890.ci.ap-beijing.myqcloud.com
Content-Length: 166
Content-Type: application/xml
﻿
<Request>
    <Tag>SpeechRecognition</Tag>
    <Input>
        <Object>input/test.mp3</Object>
    </Input>
    <Operation>
        <SpeechRecognition>
            <EngineModelType>16k_zh_video</EngineModelType>
            <ChannelNum>1</ChannelNum>
            <FilterDirty>1</ChannelNum>
            <FilterModal>1</ChannelNum>
        </SpeechRecognition>
        <Output>
            <Region>ap-chongqing</Region>
            <Bucket>test-123456789</Bucket>
            <Object>output/asr.txt</Object>
        </Output>
        <UserData>This is my data.</UserData>
        <JobLevel>0</JobLevel>
    </Operation>
    <CallBack>http://callback.demo.com</CallBack>
    <CallBackFormat>JSON<CallBackFormat>
</Request>
Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 230
Connection: keep-alive
Date: Mon, 28 Jun 2022 15:23:12 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****
﻿
﻿
<Response>
    <JobsDetail>
        <Code>Success</Code>
        <CreationTime>2021-08-05T15:43:50+0800</CreationTime>
        <EndTime>-</EndTime>
        <Input>
            <BucketId>test-1234567890</BucketId>
            <Object>input/test.mp3</Object>
            <Region>ap-chongqing</Region>
        </Input>
        <JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
        <Message/>
        <Operation>
            <Output>
                <Bucket>test-1234567890</Bucket>
                <Object>output/asr.txt</Object>
                <Region>ap-chongqing</Region>
            </Output>
            <SpeechRecognition>
                <ChannelNum>1</ChannelNum>
                <ConvertNumMode>0</ConvertNumMode>
                <EngineModelType>16k_zh_video</EngineModelType>
                <FilterDirty>0</FilterDirty>
                <FilterModal>0</FilterModal>
                <FilterPunc>0</FilterPunc>
                <OutputFileType>txt</OutputFileType>
                <ResTextFormat>0</ResTextFormat>
                <SpeakerDiarization>0</SpeakerDiarization>
                <SpeakerNumber>0</SpeakerNumber>
            </SpeechRecognition>
            <UserData>This is my data.</UserData>
            <JobLevel>0</JobLevel>
        </Operation>
        <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
        <StartTime>-</StartTime>
        <State>Submitted</State>
        <Tag>SpeechRecognition</Tag>
    </JobsDetail>
</Response>
﻿
Contact Us
Contact our sales team or business advisors to help your business.
Technical Support
Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.
7x24 Phone Support
tencent cloud

Recent Pages

Speech Recognition

Feature Overview

Request

Sample request

Request headers

Request body

Response

Response headers

Response body

Error codes

Examples

Request 1: Using a speech recognition template ID

Response

Request 2: Using the speech recognition parameter

Response

Was this page helpful?

Was this page helpful?

Node Name (Keyword)	Parent Node	Description	Type	Required
Request	None	Request container	Container	Yes
tencent cloud

Sign Up

Log in

Recent Pages

Speech Recognition

Feature Overview

Request

Sample request

Request headers

Request body

Response

Response headers

Response body

Error codes

Examples

Request 1: Using a speech recognition template ID

Response

Request 2: Using the speech recognition parameter

Response

Was this page helpful?

Was this page helpful?