In the first part of this three part series, the authors described the H.264 video compression standard including its history. In part two, the authors discuss motion estimation, the phase during which H.264 really distinguishes itself from other MPEG standards.
|Image quality of objects in motion is greater with H.264 technology|
Differentiating from other MPEG standards: Adapting raster block sizes for more detailed images
H.264 distinguishes itself from the other MPEG standards mainly during motion estimation and its two components, motion compensation and motion vectors. Motion estimation is the process by which image information is assessed for similarities that can be reused in subsequent frames. This ultimately reduces the amount of data that is encoded and therefore reduces the bit rate.
Initially, an H.264 or MPEG-2/4 encoder receives either frames (progressive video) or fields (interlaced video) from a camera or other video source. At the start of motion estimation, these images are divided up in a raster of macroblocks that are organised into arbitrarily shaped slices.
The raster is one aspect that sets H.264 apart. While MPEG-2/4 separate input frames or fields into a fixed raster of blocks containing 8x8 pixels, H.264 allows block sizes to vary. An H.264 encoder's raster can therefore include block sizes of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, or 4x4 pixels. So, less detailed areas, such as a clear blue sky, may use a 16x16 block while more detailed areas, such as the edges of moving vehicles, will probably use the smaller, 4x4 block size.
|H.264 divides macroblocks up into a raster of mixed block sizes|
Adjusting the block size as necessary not only makes H.264 encoding more efficient but it also improves the perceived quality of the image. Fixed gridlines are more jarring to the eye than jumbled blocks or chaotic patterns. As a result, most would agree that H.264 also noticeably improves the apparent video view.
Deciding which block size to use and where is not something that is defined in the H.264 standard. This allows engineers to creatively compete for the most accurate and efficient motion estimation process. Consequently, a proficient motion estimation process can be what either makes or breaks an H.264 encoder.
Motion compensation and reuse of image data
Another aspect of motion estimation is the process of motion compensation, during which the difference or change between the macroblocks is calculated. Each slice is examined in raster order with either intra- or inter-prediction.
Intra-prediction is when I blocks in I slices are assessed according to the image data found within the current slice. When P blocks in P slices are examined, image data found in the current and previous slices is referenced in an inter-prediction scan.
|Intra-prediction uses the current frame as a reference. Inter-prediction references previous frames in addition to the current frame|
The discrepancy between image data determined during motion compensation is used to produce a block containing residual information. This residue block is what is encoded and image data from previous frames is reused. As a result, only the dissimilarities between blocks are encoded and redundant aspects of images are recycled, thus reducing the bit rate.
Whereas MPEG-2/4 consult just one reference frame, H.264 has a number of previously encoded frames that it can check. This provides H.264 with the potential to reprocess even more image data than the preceding MPEG standards and, as a result, diminish the bit rate by an even greater degree, although this also increases the necessary processing power.
Data precision down to a quarter pixel
The direction in which reused pixels should shift in the following frame or field (either vertically or horizontally) is identified through motion vectors. Motion vectors indicate how best to situate data and are therefore a crucial factor in effectively reusing image information.
H.264 distinguishes itself from the other MPEG standards mainly during motion estimation
MPEG-2 and MPEG-4 SP (Simple Profile) generate motion vectors using half-pixel resolution. This means that half pixel increments are used to accurately rearrange data. H.264 goes a step further, subdividing macroblocks and creating motion vectors that can reposition image data with the precision of a quarter of a pixel.
This exactness employed by H.264 encoders further reduces the amount of data needing to be encoded, but it also increases the number of pixel positions 16-fold. As a result, H.264 encoders only average or extrapolate motion vectors for areas where there is a lot of motion or the data is most detailed.
Estimating H.264 encoder excellence
Motion compensation and the creation of motion vectors transpire concurrently in motion estimation to select the best block size, calculate the difference, and generate motion vectors for every quarter pixel to reduce the residual difference between image frames, ultimately making H.264 encoders extremely efficient.
Decreasing the bit stream comes at a cost, however. It results in an increased computational complexity and, therefore, higher processing power requirements. Engineers have to carefully implement statistical mechanisms to analyze the data flow and determine the most efficient way of using the tools and enhancements made possible with H.264. Therefore, an encoder's quality can be judged by the competence of its realisation of motion estimation.