Video Content Recognition

Last updated: 2020-10-20 14:52:14

    Video content recognition is an offline task that intelligently recognizes video content with the aid of AI. It recognizes faces, text, opening and closing credits, and speech in the video, helping you accurately and efficiently manage your videos. Specifically, it includes the following features:

    Feature Name Description Use Case
    Face recognition Recognizes faces in video image
  • Marks where celebrities appear in video image
  • Checks for sensitive figures in video image
  • Full speech recognition Recognizes all phrases in speech
  • Generates subtitles for speech content
  • Performs data analysis on video speech content
  • Full text recognition Recognizes all text in video image Performs data analysis on text in video image
    Speech keyword recognition Recognizes keywords in speech
  • Checks for sensitive words in speech
  • Retrieves specific keywords in speech
  • Text keyword recognition Recognizes keywords in video image
  • Checks for sensitive words in video image
  • Retrieves specific keywords in video image
  • Opening and closing credits recognition Recognizes opening and closing credits in video
  • Marks the positions of opening credits, closing credits, and feature in the progress bar
  • Removes opening and closing credits of videos in batches
  • Some content recognition features depend on a material library. There are two types of libraries: public library and custom library.

    • Public library: VOD's preset material library.
    • Custom library: a material library created and managed by user.
    Recognition Type Public Library Custom Library
    Face recognition Supported. Figures in the library mainly include entertainment celebrities, sports celebrities, and politically sensitive figures Supported. Call a server API to manage the custom face library
    Speech recognition Not supported yet Supported. Call a server API to manage the keyword library
    Text recognition Not supported yet Supported. Call a server API to manage the keyword library

    Video-Content-Recognition-Template">

    Video Content Recognition Template

    Video content recognition integrates a number of recognition features that require fine-grained control through parameters as shown below:

    • Recognition type enabled: which features in content recognition are enabled.
    • Material library used: whether a public or custom library is used for face.
    • Filter score specified: at what confidence score a face recognition result will be returned.
    • Filter tag specified: within what range a face tag result will be returned.

    For common combinations of operations, VOD provides a preset video content recognition template. In addition, you can also create and manage custom video recognition templates by calling a server API.

    Task Initiation

    There are three ways to initiate a video content recognition task, namely, directly initiating through server API, directly initiating through the console, and specifying a task upon upload. For more information, please see Task Initiation for video processing.

    Below are instructions for initiating video content recognition tasks in these ways:

    • Call the server API ProcessMedia to initiate a task: specify the video content recognition template ID in the AiRecognitionTask parameter in the request.
    • Call the server API ProcessMediaByUrl to initiate a task: specify the video content recognition template ID in the AiRecognitionTask parameter in the request.
    • Initiate a task on a video through the console: call a server API to create a task flow, configure a video content recognition task in it (by specifying MediaProcessTask.AiRecognitionTask), and use it to initiate video processing in the console.
    • Specify a task upon upload from server: call a server API to create a task flow, configure a video content recognition task in it (by specifying MediaProcessTask.AiRecognitionTask), and specify it as the procedure in the ApplyUpload request.
    • Specify a task upon upload from client: call a server API to create a task flow, configure a video content recognition task in it (by specifying MediaProcessTask.AiRecognitionTask), and specify it as the procedure in the signature for upload from client.
    • Upload through console: call a server API to create a task flow, configure a video content recognition task in it (by specifying MediaProcessTask.AiRecognitionTask), upload a video through the console, select Process Video During Upload, and specify to execute this task flow upon video upload completion.

    Getting Result

    After initiating a video content recognition task, you can wait for result notification asynchronously or perform task query synchronously to get the task execution result. Below is an example of getting the result notification in normal callback mode after the content recognition task is initiated (the fields with null value are omitted):

    {
        "EventType":"ProcedureStateChanged",
        "ProcedureStateChangeEvent":{
            "TaskId":"1400155958-Procedure-2e1af2456351812be963e309cc133403t0",
            "Status":"FINISH",
            "FileId":"5285890784363430543",
            "FileName":"Collection",
            "FileUrl":"http://1400155958.vod2.myqcloud.com/xxx/xxx/aHjWUx5Xo1EA.mp4",
            "MetaData":{
                "AudioDuration":243,
                "AudioStreamSet":[
                    {
                        "Bitrate":125599,
                        "Codec":"aac",
                        "SamplingRate":48000
                    }
                ],
                "Bitrate":1459299,
                "Container":"mov,mp4,m4a,3gp,3g2,mj2",
                "Duration":243,
                "Height":1080,
                "Rotate":0,
                "Size":44583593,
                "VideoDuration":243,
                "VideoStreamSet":[
                    {
                        "Bitrate":1333700,
                        "Codec":"h264",
                        "Fps":29,
                        "Height":1080,
                        "Width":1920
                    }
                ],
                "Width":1920
            },
            "AiRecognitionResultSet":[
                {
                    "Type":"FaceRecognition",
                    "FaceRecognitionTask":{
                        "Status":"SUCCESS",
                        "ErrCode":0,
                        "Message":"",
                        "Input":{
                            "Definition":10
                        },
                        "Output":{
                            "ResultSet":[
                                {
                                    "Id":183213,
                                    "Type":"Default",
                                    "Name":"John Smith",
                                    "SegmentSet":[
                                        {
                                            "StartTimeOffset":10,
                                            "EndTimeOffset":12,
                                            "Confidence":97,
                                            "AreaCoordSet":[
                                                830,
                                                783,
                                                1030,
                                                599
                                            ]
                                        },
                                        {
                                            "StartTimeOffset":12,
                                            "EndTimeOffset":14,
                                            "Confidence":97,
                                            "AreaCoordSet":[
                                                844,
                                                791,
                                                1040,
                                                614
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "Id":236099,
                                    "Type":"Default",
                                    "Name":"Jane Smith",
                                    "SegmentSet":[
                                        {
                                            "StartTimeOffset":120,
                                            "EndTimeOffset":122,
                                            "Confidence":96,
                                            "AreaCoordSet":[
                                                579,
                                                903,
                                                812,
                                                730
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    }
                }
            ],
            "TasksPriority":0,
            "TasksNotifyMode":""
        }
    }
    

    In the callback result, ProcedureStateChangeEvent.AiRecognitionResultSet contains the recognition result in Type of FaceRecognition, which represents face recognition.

    The result in Type of FaceRecognition shows that Output.ResultSet contains two recognized figures John Smith and Jane Smith. SegmentSet indicates the time period (determined by StartTimeOffset and EndTimeOffset) during which a face appears in the video and the coordinates (determined by AreaCoordSet) in the video image.

    Was this page helpful?

    Was this page helpful?

    • Not at all
    • Not very helpful
    • Somewhat helpful
    • Very helpful
    • Extremely helpful
    Send Feedback
    Help