Understanding transform, quantisation and entropy encoding in H.264 video compression
In the concluding part of this 3part review of the H.264 video compression standard, Kate Huber, Peter de Konink and Piet Nieuwets of Siqura discuss the transform, quantisation and entropy encoding  various blockencoding steps following motion estimation. Describing data in the transform matrix In contrast to the motion estimation step, the transform phase in the encoding process is relatively similar in H.264 and the MPEG standards. At this point in encoding, all the residual data collected during motion estimation is described using the Discrete Cosine Transformation (DCT) method. Initially, information from each residual block is depicted as one 16x16 pixel brightness (luma) block and two 8x8 pixel colour (chroma) blocks. These image data blocks are analyzed and replaced by a DCT pattern with corresponding coefficients that precisely represent the original information. This transform process results in a matrix of coefficients reflecting the amount of data to be encoded. So, the fewer matrix values there are and the lower their values, the less residual image data there is. Accordingly, this results in a better image with fewer bits. How transform works is easy to understand if you think about it in terms of modern art. It's quite easy to describe a new painting that is just a canvas covered in solid blue paint. It is, however, much more difficult to tell someone in detail about the intricacies of a Jackson Pollock painting. Similarly, DCT coefficients ideally describe a solid grey block, or a block with little or no residual data. The more coefficients that need to be used, the more residual details there are.
In scalar quantisation, values within a predetermined range around zero are deemed inconsequential and are therefore reduced to zero. This lowers the bit rate without necessarily impacting the perceived quality of the image. Both the transform and quantisation stages depend on an adept motion estimation process. The advancements H.264 encoding makes in motion estimation are what, in the end, lower the residual image data and allow high quality images to be transformed and quantised to coefficients nearing zero. Therefore, improvements in motion estimation are what ultimately allow better video quality at a lower bit rate. However, H.264 encoding includes additional developments that augment the effectiveness of this streaming standard.
Recognising and reducing data repetition
H.264 VLCs ultimately make streaming redundant data more efficient even though they increase the processing power requirements. CAVLC and CABAC reduce the bit rate by adapting to repeatedly received data sequences when that is statistically proven to be more efficient. So, knowing how and when to implement a particular VLC is just another challenge put to H.264 engineers. A simple example may help to explain how CAVLC works. Suppose that every time you said, "I'd like a cup of coffee", you received one, and so, after a while, you started just saying "I'd like". While this is a very easy way to satiate your coffee craving, should you ever just want a glass of water, you would need to explain yourself without saying "I'd like". CAVLC works similarly. If the entropy encoder receives recurring data patterns, it replaces them with a codeword, like 1. However, other sequences then need to be described without using a 1. This can sometimes lead to longer codewords in unique data streams.
