Video content recognition is an offline task that intelligently recognizes video content including faces, text, opening and closing credits, and speech with the aid of AI. See the table below for details:
Feature | Description | Use Case |
---|---|---|
Face recognition | Recognizes faces in video image | |
Full speech recognition | Recognizes all phrases in speech | |
Full text recognition | Recognizes all text in video image | Performs data analysis on text in video image |
Speech keyword recognition | Recognizes keywords in speech | |
Text keyword recognition | Recognizes keywords in video image | |
Opening and closing credits recognition | Recognizes opening and closing credits in video |
Some content recognition features depend on a material library. There are two types of libraries: public library and custom library.
Recognition Type | Public Library | Custom Library |
---|---|---|
Face recognition | Supported. The library includes celebrities in sports and the entertainment industry and other people. | Supported. Call a server API to manage the custom face library |
Speech recognition | Not supported yet | Supported. Call a server API to manage the keyword library |
Text recognition | Not supported yet | Supported. Call a server API to manage the keyword library |
Video content recognition integrates a number of recognition features that require fine-grained control through parameters as described below:
VOD provides preset video content recognition templates for common parameter combinations. You can also use a server API to create and manage custom templates.
You can initiate a video content recognition task by calling a server API, via the console, or by specifying the task when uploading videos. For details, see Task Initiation.
Below are instructions for initiating video content recognition tasks in these ways:
AiRecognitionTask
parameter in the request.AiRecognitionTask
parameter in the request.MediaProcessTask.AiRecognitionTask
), and use it to process videos in the console.MediaProcessTask.AiRecognitionTask
), and set procedure
to the name of the task flow when calling ApplyUpload.MediaProcessTask.AiRecognitionTask
), and set procedure
to the name of the task flow in the signature for upload from client.MediaProcessTask.AiRecognitionTask
) and, when uploading videos via the console, choose Automatic Processing After Upload and select the task flow created.After initiating a video content recognition task, you can wait for result notification asynchronously or perform task query synchronously to get the task execution result. Below is an example of getting the result notification in normal callback mode after the content recognition task is initiated (the fields with null value are omitted):
{
"EventType":"ProcedureStateChanged",
"ProcedureStateChangeEvent":{
"TaskId":"1400155958-Procedure-2e1af2456351812be963e309cc133403t0",
"Status":"FINISH",
"FileId":"5285890784363430543",
"FileName":"Collection",
"FileUrl":"http://1400155958.vod2.myqcloud.com/xxx/xxx/aHjWUx5Xo1EA.mp4",
"MetaData":{
"AudioDuration":243,
"AudioStreamSet":[
{
"Bitrate":125599,
"Codec":"aac",
"SamplingRate":48000
}
],
"Bitrate":1459299,
"Container":"mov,mp4,m4a,3gp,3g2,mj2",
"Duration":243,
"Height":1080,
"Rotate":0,
"Size":44583593,
"VideoDuration":243,
"VideoStreamSet":[
{
"Bitrate":1333700,
"Codec":"h264",
"Fps":29,
"Height":1080,
"Width":1920
}
],
"Width":1920
},
"AiRecognitionResultSet":[
{
"Type":"FaceRecognition",
"FaceRecognitionTask":{
"Status":"SUCCESS",
"ErrCode":0,
"Message":"",
"Input":{
"Definition":10
},
"Output":{
"ResultSet":[
{
"Id":183213,
"Type":"Default",
"Name":"John Smith",
"SegmentSet":[
{
"StartTimeOffset":10,
"EndTimeOffset":12,
"Confidence":97,
"AreaCoordSet":[
830,
783,
1030,
599
]
},
{
"StartTimeOffset":12,
"EndTimeOffset":14,
"Confidence":97,
"AreaCoordSet":[
844,
791,
1040,
614
]
}
]
},
{
"Id":236099,
"Type":"Default",
"Name":"Jane Smith",
"SegmentSet":[
{
"StartTimeOffset":120,
"EndTimeOffset":122,
"Confidence":96,
"AreaCoordSet":[
579,
903,
812,
730
]
}
]
}
]
}
}
}
],
"TasksPriority":0,
"TasksNotifyMode":""
}
}
In the callback result, ProcedureStateChangeEvent.AiRecognitionResultSet
contains the result of face recognition (Type
is FaceRecognition
).
According to the content of Output.ResultSet
, two people are recognized: John Smith
and Jane Smith
. SegmentSet
indicates when (from StartTimeOffset
to EndTimeOffset
) and where (coordinates specified by AreaCoordSet
) the two people appear in the video.
Was this page helpful?