Advanced Video Coding

What is H.264 or Advanced Video Coding? 

Advanced Video Coding (AVC) is the most widely used standard for video coding. It is the process of converting a digital video into a format that offers efficient video transmission and storage. Video coding (video compression) is an essential component for many applications such as high-definition (HD) TV broadcast, video conferencing, mobile services, internet video streaming, HD video storage, and many more.  It was jointly standardised by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to help manufacturers to inter-operate

AVC is also referred to as H.264 or MPEG-4 AVC and first published in 2003. The AVC standard defines a syntax for encoding video and a method for decoding back the video sequence. However, the standard does not specify algorithms to encode video, that is left open to the manufacturers. AVC builds on the concepts of previous standards and offers good video quality at substantially lower bit rates. This technology is protected by patents owned by various parties and the majority of the patents are administered by MPEG LA. However, the use of Advanced Video Coding technology for streaming online video is free to end-users.  

Advanced Video Coding Working

Any advanced video coding technique consists of an encoder to convert a video into a compressed format and a decoder to convert a compressed video back to its original video. The encoder follows the prediction, transform, and encoding process to produce a compressed H.264 bitstream. Whereas, decoder carries out the complementary process of decoding, inverse transform, and reconstruction to obtain a video output. A simple overview of AVC encoder and decoder processes is highlighted in the form of block diagram below.   

advanced video coding

Advanced Video Coding Encoder

Prediction: The typical encoding operation begins with the splitting of each frame of video into units of a macroblock of 16×16 or 4×4 pixels. The prediction of the current frame-block is carried out either by intra prediction or inter prediction. The intra prediction uses a block size of 4×4 and 16×16 to predict the current macroblock from the surrounding previously coded pixels within the same frame. Inter prediction operation uses different blocks ranging from 16×16 to 4×4 size to predict pixels in the current frame using similar regions in previously coded and transmitted frames. The first frame of a video is always encoded using intra prediction. For all remaining pictures, inter prediction coding is used, that employs motion compensation (temporal coding) with the help of motion estimation. The difference between the original input samples and the predicted samples (either intra or inter-block) is called the residual of prediction.   

Transform: The residual block of prediction is transformed using a 4×4 integer transform. This transform is a particular type of discrete cosine transform (DCT). The transform outputs a set of coefficients, and these coefficients are used as a weighting factor to a standard basis pattern at the decoder. The transform coefficients are quantized, and the result is a block in which most of the coefficients are zero, with a few non-zero coefficients. The proper selection of the quantization parameter decides the trade-off between the efficiency and the decoded image quality.

Bitstream encoding: The prediction and transform operations produce several values that need to be encoded. The values such as quantized transform coefficients, information about the structure of the compressed data and the compression tools used during encoding are converted into binary codes using variable length, arithmetic, or entropy coding. These encoding methods produce an efficient, compact binary representation of the video. Finally, the encoded bitstream can then be stored or transmitted.

Decoder

Bitstream Decoding: The compressed H.264 bitstream is received at a decoder. It decodes the syntax elements and extracts the information such as quantized transform coefficients, prediction information and others. This information is used to recreate a sequence of video images. 

Inverse Transform: The transform coefficients are multiplied by an integer number to restore its original value. The weighted re-scaled coefficients are combined with standard basis patterns to recreate each of the residual blocks. These blocks together form a residual macroblock. 

Reconstruction: The decoder adds the prediction to each residual macroblock. A decoded macroblock can then be displayed as a part of a video frame.  

Design 

Features: H.264 contains many new features compared to its earlier standards, and few prominent features are listed below.

  • Multi-picture inter-picture prediction
  • Variable block-size motion compensation
  • Lossless macroblock coding
  • Loss resilience
  • Switching slices
  • Supplemental enhancement information (SEI) 
  • video usability information (VUI)

Profiles: The standard defines different sets of capabilities to target specific classes of application, and they are named as profiles. These are sometimes declared using a set of additional constraints applied in the encoder. The profile allows the decoder to recognise the requirements for specific bitstreams. The most commonly used profile is the high profile, and a few other important profiles are enumerated below. 

  • Constrained Baseline Profile
  • Extended Profile
  • Scalable Baseline Profile
  • Stereo High Profile
  • Multiview Depth High Profile

Level: It is a specified set of constraints that indicate the required decoder performance for a profile. A decoder must be able to decode all bitstreams for the accepted level and all lower levels. 

Decoded picture buffering: AVC uses previously encoded pictures to provide predictions in other pictures. Therefore, the decoder also uses previously decoded pictures to decode further pictures, and such images are stored in decoded picture buffers. Thus, a decoder needs sufficient memory to handle at least one frame more than the maximum capacity of the decoded picture buffer.    

Advantages

The biggest advantage of H.264 over previous standards is its compression performance that can provide better image quality at the same compressed bitrate. This improved compression performance comes at a greater computational cost taking significantly more processing power to compress and decompress a video. However, along with improved compression performance, it also offers greater transmission support and flexibility to select from a wide variety of compression tools. These advantages make it suitable for applications ranging from mobile transmission to high definition consumer TV.        

Applications

Today, AVC is used in more than 90 % of the video industry. It is adopted in many applications, and few are noted below. 

  • HD- DVD and Blu-Ray formats
  • HD-TV broadcasting
  • Mobile TV broadcasting
  • Internet Video