SpeakerSound File Formats

There are a large number of sound file formats that are in fairly common use. Having an understanding of their basic properties can help you pick the correct format for your particular application.

Basic Types of Sound File Format

Like digital image file formats, it is possible to divide all the different sound file formats into three basic groups. These are:

  1. Uncompressed: the audio data in the sound file is not compressed. This is the raw encoding of sound.
  2. Lossless compression: the audio data in the file is compressed, but all the information in the original sound is recovered when the file is played.
  3. Lossy compression: the audio data in the file is compressed and some of the information in the orginal is lost; when the file is played it may not sound as good as the original.

There are a number of different file formats in each of these categories.

Uncompressed Audio

If you are a digital photographer, this kind of audio file is like the raw image format that your camera offers. This is the type of format used on an audio CD to encode the music. The main way of encoding uncompressed audio data is Pulse Code Modulation (or PCM) which is used for BlueRay, DVD and CD.

PCM
Check out the Wiki page for a more detailed description of PCM.

There are two factors which determine the quality of the audio recorded in a PCM file and these are:

  1. Sample rate
  2. Sample size

The sample rate is the number of times a second the sound is measured. The more often the sound is sampled, the higher the quality of the resulting recording. The sample rate for a standard audio CD is 44,100 times a second, usually written as 44.1 kHz. In contrast, audio on a DVD can have sample rates up to 192 kHz.

The sample size is the amount of information stored each time the sound is sampled. The more information that is recorded for each sample, the higher the quality of the resulting recording. The sample size for a standard audio CD is 16 bits, meaning each sample will be a number in the range -32,768 to 32,767. In contrast, the sample size for audio on a DVD can be as high as 24 bits, giving a number in the range -8,388,608 to 8,388,607.

So, the higher the sample rate and sample size, the better the audio quality. However, this also means that the file will be larger. Obviously, recording in stereo has two channels, and therefore needs twice as much information as mono.

For example, a 3 minute stereo track encoded at standard CD quality would have 7,938,000 samples, each sample being 16 bits, or 2 bytes in size. That gives a total file size of 30.3 MB. In comparison, the same track at full DVD quality audio would be 197.8 MB. Thus, the increase in audio quality comes at the expense of much higher file sizes.

There are two file formats for representing PCM data on a computer. On Windows there is the WAV file and Apple have the AIFF format. They are very similar formats and both encode the PCM sample data without compression.

The large file sizes for audio prompted a search for techniques to reduce the amount of information that needs to be stored when recording the data. The techniques used are called compression techniques. An example of compression that you may already be familiar with is the ZIP format for compressing files on a disk. As mentioned above, there are lossless and lossy compression methods.

Lossless Compressed Audio

Lossless compression is much prefered by audiophiles as it does not reduce the quality of the sound in any way. The trade off is that the resulting file sizes are larger than corresponding lossy compressions.

FLACProbably the most common lossless compression format is the Free Lossless Audio Codec (or FLAC) format. The usual compression rate cited for FLAC is between 30% and 50%. The file extension for FLAC files is .flac

Apple have their own version of lossless audio compression called Apple Lossless Audio Codec (or ALAC). The compression rate cited for this is between 40% and 60%. The file extension for this format is .m4a

CompressionLossy Compressed Audio

With lossless compression, the emphasis is on retaining the full sound quality, whilst trying to reduce the file size as much as possible. In contrast, with lossy compression, the emphasis is on reducing the size of the audio file whilst maintaining reasonable sound quality.

All forms of lossy compression lose data from the original sound, resulting in some loss of audio quality. The effort in designing this kind of compression goes into making the files as small as possible, with as little impact on the perceived sound quality as possible. To make this possible, the designers of lossy audio compression have to have a good understanding of the human psychology of hearing.

The key factor in determining the file size is the bit rate of the compression. The higher the bit rate, the higher the quality of the audio, but this also results in larger files. Lower bit rates give smaller files, but this eventually impacts on the audio quality when the file is played.

MP3The main lossy compression format is undoubtedly MPEG Audio Layer III (more commonly known as MP3). This provides much greater compression than lossless techniques, making reductions in file size of 90% or more possible. However, the greater the reduction in file size, the more likely it is that the perceived quality of the sound will suffer. The file extension for MP3 files is .mp3

OGG logoLike the GIF image format, the MP3 audio format is patent encumbered. This has driven the development of alternative, free audio formats, in particular the Ogg Vorbis audio format. This format is sometimes claimed to have better audio reproduction for a given compression setting than MP3. The file extension for this format is .ogg

The Microsoft audio format Windows Media Audio (or WMA) exists in a number of different forms, some of which also provide a lossy compression format. Microsoft claim that the quality of lossy WMA is better than the equivalent MP3. However, this claim is controversial. See this Wiki article for some discussion of the claim. The file extension for this format is .wma

Comparing Different Formats

The following table shows a comparison of the actual files sizes for the same music track encoded in different formats. This is clearly not a comprehensive study, and different tracks would give different results. In particular, a track containing only spoken word audio will compress much more than music with a large spectrum spread.

The following table shows the figures for a real music track. It is 3 minutes and 51 seconds long, with a reasonable dynamic range.

Format Bit Rate (kb/s) File Size (kB) % of Orig
WAV - 39,845 -
FLAC - 17,462 44%
MP3 128 3,616 9%
MP3 56 1,582 4%

The figures for this specific track are in agreement with the general range of file sizes discusssed above.

References