Parametric audio coding for dummies

In this week's tech note Professor Rumsey looks at the way audio coding has changed over the last three decades.
Publish date:
Social count:
In this week's tech note Professor Rumsey looks at the way audio coding has changed over the last three decades.

We’ve become quite familiar with low bit-rate audio coding systems in the past 15 years or so. As the “father of MP3”, Karlheinz Brandenburg, remarked in his recent Heyser Memorial Lecture during the 130th AES Convention, transmitting high quality audio over phone lines seemed impossible 30 years ago, but the impossible has now become commonplace and the business model for music sales has changed dramatically.

From the iPod to digital television and the DVD, perceptual coding of audio signals is now used to reduce the bit rate in order to squeeze more channels or storage capacity out of the available space. These remarkable systems requantise the digital audio signal, guided by a model of the human hearing system in most cases, in such a way that any increased noise is masked by the audio signal. When the bit rate is high enough the results can be surprisingly good, but when it is pushed too low the quality begins to suffer. Parametric coding is one way of squeezing more out of less, when the bit rate gets too low to represent the audio information using conventional psycho-acoustic coding. This can be important with mobile networks, for example. Essentially parametric coding involves measuring some of the vital statistics of the audio signals and transmitting them as descriptive information rather than as coded audio signals. This descriptive information is often sent alongside a conventionally encoded core audio signal, as “side data”, so that it can be used to enhance the reconstruction of the audio in the decoder. This can enable a valuable feature such as surround without needing to transmit lots of additional audio data. It’s the basis of MPEG Surround, for example, which uses a system that conventionally encodes a mono or stereo downmix of the surround channels, together with side parameters that describe the interchannel differences (time, level, correlation) between the downmixed channels and the surround format channels in narrow frequency bands.

An ordinary two-channel decoder simply ignores the side information and decodes the downmix, whereas a surround decoder can reconstruct the missing channels based on the descriptive information in the side data. Provided this is done sufficiently often and with enough frequency resolution to isolate individual dominant source elements in the scene, the results can be remarkably effective, requiring very little overhead. The additional bit rate required for the side information is usually only a few kilobits per second, as opposed to the few hundred that might be needed to transmit the surround information as conventionally coded audio. This enables convincing surround to be transmitted at bit rates as low as 64 kilobits per second. Another common use of parametric coding is in “spectral band replication” (SBR). This technique is used in AAC Plus (HE-AAC), for example, to deliver audio that is coded at a much lower sampling frequency than otherwise necessary, while preserving something like the original high frequency content. Here the side data represents a description of the upper end of the audio frequency spectrum – the part that is omitted by the basic encoder. Based on the assumption that there are usually similarities between the upper end of the audio spectrum (say above 10 kHz) and the lower part, the decoder reconstructs the missing upper part by transposing a chunk of the lower part and shaping it according to the descriptive side information. Amazingly it usually works and you get the impression of a full bandwidth system from one that only transmits signals up to around 10 kHz, again with only a few kilobits per second needed to send the extra parameters.