The container format (
Format) means that the encoded and compressed video stream and audio stream are put into one file according to a certain format specification. For online VOD, a more appropriate term is called "streaming network transport protocol". The most widely used protocols in the internet are as follows:
A codec is a program or device that can compress or decompress (video decoding) digital videos. The common encoding methods include:
Bitrate refers to the number of bits that are required in playback of continuous media (e.g., compressed audio or video) per unit time. It is measured in bit/s or bps.
Frame rate refers to the number of frames in a video per unit time. It is measured in FPS (frame per second) or Hz.
Resolution determines a video's capability to define details and is represented in the number of pixels in each direction, e.g., 640 * 480.
A GOP (group of pictures) refers to a set of continuous pictures within an encoded video stream. It lasts from an I-frame till the next I-frame. A GOP can contain the following types of pictures:
The number of pictures within a GOP is called GOP length.
An IDR (instantaneous decoding refresh) picture is a type of I picture. Different from a common I picture where P and B pictures after it can reference other I pictures before it, no pictures after an IDR picture can reference any pictures before it.
In VOD scenarios, a player generally allows dragging on the progress bar to a desired position. The most convenient way for the end is to start playback from the IDR picture in close proximity to that position. This is because the player knows that all pictures after the IDR picture will not reference other I pictures before it, thus avoiding complicated reverse resolution.
If IDR frame alignment is specified when multi-bitrate transcoding is performed on a video, IDR pictures of all the output videos will be precisely aligned by time point and picture content, so that video players can smoothly switch among videos at different bitrates without obvious lagging.
Profile is a collection of specific encoding features for a specific application scenario. H.264 mainly supports three profiles:
A color space is an abstract mathematical model which simply describes the range of colors as tuples of numbers, typically as 3 or 4 values or color components.
Video noise is random variation of brightness or colors in an image produced by a sensor, scanner circuit, or digital camera. It can also originate in film grain and fixed shot noise of a photon detector. It is generally viewed as an undesirable by-product of image capturing. Video noise reduction is to remove unwanted noise from a video while retaining useful information such as important details in the video.
In the era of analog television, the processing speed and network bandwidth of playback devices were limited. In this context, interlacing was developed to deliver videos at lower bitrates without reducing the source frame rates. It can reduce the video transmission bandwidth by 50% while basically retaining source image quality. However, it has noticeable negative effects such as low definition, flickering, and jaggies along image edges.
Nowadays, video playback devices and network bandwidth have been improved greatly, and interlacing gradually becomes obsolete and is not supported by some new device models. Therefore, old videos that were processed with interlacing need to be "deinterlaced".
A codec is a method of converting analog audio signals to digital signals (or vice versa) and mainly includes lossless and lossy encoding. According to sampling principles, encoded audio signals can only get "infinitely similar to" natural signals; therefore, all audio codecs are lossy in essence. In computer fields, pulse code modulation (PCM) that achieves the highest fidelity is generally agreed as lossless encoding. All the popular audio codecs in internet services are lossy, such as MP3 and AAC.
Sample rate refers to the number of discrete signals extracted from and comprising continuous signals per second. It is measured in Hz.
Please see the description of bitrate in the Video Encoding Terms section.
A sound channel refers to an independent audio signal collected from different spatial positions when sound is recorded or played back. The number of sound channels is the number of sound sources during recording or number of speakers during playback.
ISO date format is a time format as specified in ISO 8601. In VOD, unless otherwise specified, all time-related parameters use UTC time in ISO 8601 standard (in the format of YYYY-MM-DDThh:mm:ssZ). For example, 2018-10-01T10:00:00Z represents 18:00:00 on October 1, 2018 Beijing Time (UTC+8).