Audio Basics: Digital Compression



cc licensed flickr photo shared by juanpol

Digital audio compression was developed to reduce the file size of audio data files for transmission over computer networks such as the internet. When the the internet and later the www emerged bandwidth was very limited and it was not possible to exchange raw uncompressed audio files as they were simply too large. Even a 1 hr stereo recording at 16 bit 44.1 Khz is around 500 Mb in size, even by today standards a challenge to send over the network.

Audio compression is done by performed by clever mathematical algorithms which remove data from a file to reduce it's size. A surprising amount of information can be removed from an audio file before most humans can detect the difference. Increasing many consumers now purchase compressed music files over the internet for use on their mobile players.

We have come to value mobility and the flexibility of digital media over sure audio fidelity.

There are 2 types of audio compression techniques, lossless which scans for patterns and redundancy within audio data files but which do not affect the integrity of the audio and lossy techniques which remove information from the file in an endeavor to trick the human ear in to believing no loss has been inured when in fact it has been irreparably altered. Using lossy algorithms involves a comprise, the more you compress the file the smaller it becomes but the more the sound quality is degraded. Using this method you take your original uncompressed file and render it out in a lossy format and later down the track if with to improve the quality you return to the original file and re render it at a different compression rate.
For example, one compact disk (CD) holds approximately one hour of uncompressed high fidelity music, less than 2 hours of music using lossless compression, or 7 hours of music compressed in the lossy MP3 format at medium bit rates.
Audio compression (data), wikipedia

There are a wide range of different compression techniques available, we will look at few of the most common types available below. For a more detailed and comprehensive overview and comparison of audio format see the excellent wikipedia reference @ http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

Common Lossy Audio Compression Formats

MPEG-1 (Audio Layer 3) Compression - (MP3)



cc licensed flickr photo shared by Alan Joyce

Mp3 is the most commonly used compression format used for the distribution of audio on the www particularly in the podcasting arena. The compression works by reducing accuracy of certain parts of sound that are deemed beyond the auditory resolution ability of most people. mp3, wikipedia In a way this is similar to the jpg compression format used for visual images in that it removes information from the image but in a way that is inperceptibale to the human eye. As we get older the dynamic range of our hearing diminishes, especially at the higher frequencies so for most people this can be removed without much notice. Added this this is the the fact that for most podcasts that use human voice alone you don't need the high fidelity range anyway.

The degree of compression applied to a un-compressed file is commonly expressed as the bitrate in Kilo Bits per second or Kbs.

Another way to look at MP3 bitrate is the ratio of the amount of compression applied.
320 Kbs = 4:1 minimum quality loss. 128 Kbs = 10:1 standard Internet music download. 64 Kbs = 11:1 good voice quality. 32 Kbs = 22:1 average voice quality.
In summary - the lower the sampling rate and the higher the MP3 compression, the lower the file size, but poorer the sound quality, and vice versa.

Bitrate (kilo bits per second)
Sound Quality
Play Time with 64 Mb Memory
128 Kbs & higher
near-CD
1 hour
96 Kbs
Better than FM
1.5 hours
64 Kbs
FM broadcast
2 hours
32 Kbs
AM broadcast
4 hours

For recordings featuring mainly voice, anywhere between 32Kbps (Kilobits per second) and 64Kbps provides acceptable quality.
In order to reduce the file size to as small as possible while retaining a reasonable quality voice recording 32Kbps is recommended for podcasts.

Although very common the MP3 format is patented by the producers of the format Fraunhofer and therefore subject to license arrangements (expires latest: 2017). For users this is generally been dealt with by the writers of the software performing the encoding/decoding.

Advanced Audio Coding - (AAC)



cc licensed flickr photo shared by KhE 龙

The AAC is a lossy format developed after the MP3 standard and was designed to overcome some of the limitations inherent in the MP3 format. AAC has a wide level of industry support and is the default or standard audio format for: Apple's iPhone, iPod, iPad, Nintendo DSi, iTunes, DivX Plus Web Player, Sony's PlayStation 3 and is supported by Sony's PlayStation Portable, latest generation of Sony Walkman, phones from Sony Ericsson, the latest S40 and S60 models from Nokia, Android based phones & Nintendo's Wii.

Despite the numerous advantages AAC has over MP3 including sampling capacity 96 Khz and handling of frequencies over 16 Khz MP# has remain remarkably resilient and is still the favored choice for distribution on the internet.

Apples adoption of AAC in the iTunes music sore has no doubt contributed to it's success and has incorporated the Fair Play Digital Rights Management or DRM into the music purchased on the iTunes misic store. Apple also produced some additional features to AAC which allow markers and hyperlinks to be added to audio files which allow an enhanced user experience for podcasts on Apple media players. It's use my podcasters has not been significant and MP3 remains format of choice for most podcasters.

Windows Media Audio Standard - (WMA)



cc licensed flickr photo shared by nino63004

Like MP3 WMA is a lossy audio codec based on psychoacoustics where audio signals that are deemed to be imperceptible to the human ear are encoded with reduced resolution during the compression process. Development by Microsoft in the late 1990's the format is well established in the PC ecosystem and has wide support in both software and hardware media players. Interestingly it's not well supported in the Apple ecosystem will not play on ipods/iphones/ipads so not really an option for podcasting.

OGG Vorbis - (OGG)


200px-XiphophorusLogoSVG.png


OGG Vorbis is an open source lossy audio compression format produced by the Xiph.Org Foundation

OGG is the container and the actual audio compression codec at work is Vorbis.

Being open source there are no patents constraints on it's use and it is supported by software such as Audacity. It is however not well supported by media players and struggles to compete with the uptake of MP3 and AAC.