Thursday, April 1, 2010

Introduction to Signal (Audio/Video) Compression

What is Compression?
Multimedia material is limited in its quality by the capacity of the channel it has to pass through. With respect to analog signals this translates to bandwidth and the SNR limit of the channel. In case of digitised signals the limitation factors are sampling rate and the sample bit-depth, which are related to the bit-rate. Compression can be defined as a technique which tries to produce a signal which is better than the channel it has passed through would normally allow. This means a coder is required at the transmitting end and a decoder  is required at the recieving end of the channel.
In the digital world encoders accept digital (audio/video) signals at the source bit-rate and convert them to lower bit-rates before passing them through a channel.
How does it work?
Shanon's theory states that, any signal which is predictable does not contain any information. Compression can be achieved by sending only the useful information also known as entropy. The remaining part of the
input signal is called the redundancy. It is redundant because it can be predicted from what the decoder has already been sent.
Some caution is required when using compression because redundancy can be useful to reconstruct parts of the signal which are lost due to transmission errors. Clearly if redundancy has been removed in a
compressor the resulting signal will be less resistant to errors. unless a suitable protection scheme is applied.
Audio Compression
Audio compression relies on perceptual coding. Human auditory system is capable of percieving changes in low frequencies compared to that in high frequencies. It also fails to register energy in some bands when there is more energy in a nearby band. Audio compressors work by raising the noise floor at frequencies where the noise will be masked. A detailed model of the masking properties of the ear is essential to their design. The greater the compression factor required, the more precise the model must be.
Predictive coding uses circuitry which uses a knowledge of previous samples to predict the value of the next. It is then only necessary to send the difference between the prediction and the actual value. The receiver
contains an identical predictor to which the transmitted difference is added to give the original value. Predictive coders have the advantage that they work on the signal waveform in the time domain and need a relatively short signal history to operate. They cause a relatively short delay in the coding and decoding stages.
Sub-band coding splits the audio spectrum up into many different frequency bands to exploit the fact that most bands will contain lower level signals than the loudest one.
In spectral coding, a transform of the waveform is computed periodically. Since the transform of an audio signal changes slowly, it need be sent much less often than audio samples. The receiver performs an inverse transform.
Video Compression
Video compression relies on following assumptions. First being human visual sensitivity to noise in the picture is highly dependent on the frequency of the noise. The second is that even in moving pictures there is a great deal of commonality between one picture and the next. Data can be conserved by raising the noise level where it cannot be detected and by sending only the difference between one picture and the next. Practical video compressors must perform a spatial frequency analysis on the input, and then truncate each frequency
individually in a weighted manner. Such a spatial frequency analysis also reveals that in many areas of the picture, only a few frequencies dominate and the remainder are largely absent. Clearly where a frequency is absent no data need be transmitted at all. For moving pictures, exploiting redundancy between pictures, known as inter-coding, gives a higher compression factor. Starting with an intra-coded picture, the subsequent pictures are described only by the way in which they differ from the one before. The difference picture is produced by subtracting every pixel in one picture from the same pixel in the next picture. This difference picture is an image in its own right and can be compressed with an intra-coding process