The Audio Buzz Blog

WAV or MP3 - An image showing a WAV file and MP3 file
by James Nugent August 28, 2024

WAV or MP3: What’s the difference?

Despite being around for many years, WAV and MP3 are the most commonly used audio formats in 2024; even casual music fans will recognize these initials. However, the meaning of each and the difference between the two still cause some confusion. So, we’re diving into the world of audio formats to discuss the difference between WAV and MP3, the advantages and disadvantages of each, and a few other noteworthy formats.

Before getting into all things WAV and MP3, let’s explain what an audio file format is.

What is an Audio File Format?

An audio file format is a digital format that stores audio data as binary information that digital devices can understand.

These formats are used to store audio data on computers, smart devices, CDs, etc.

To best explain digital formats, let’s go back a little further and start with a much-loved analog format, vinyl records.

WAV or MP3 - A photo of vinyl records in a music store

Photo by Mick Haupt on Unsplash

Vinyl records store the original audio waveform directly on the physical media.

While purists will tell you that nothing matches the warm character of a vinyl record (which is true), analog formats are prone to degradation, and even with the best care, that charm may not last forever.

Another shortcoming of vinyl is a limited dynamic range (the difference between the loudest and quietest parts) and a limited range of pitch.

Although you may never notice any pitch limitations of vinyl, lower frequencies take up more real estate, meaning you can’t fit as much audio on the average record.

Higher frequencies often cause tracking issues for the stylus, leading to distortion.

The advent of CD and digital audio formats alleviated many shortcomings of vinyl, providing:

  • A wider dynamic range
  • Consistent playback quality (no crackle or hissing)
  • Better low-end
  • Easier editing

The ability to skip/shuffle album tracks easily and send material instantly

Since digital devices don’t read analog audio signals, they use analog-to-digital converters to convert analog waveforms to binary information.

Let’s look at an analog sine wave and its digital representation.

An image showing an analog sine wave, and its digital representation

As you can see from the image, the analog sine wave is continuous.

In contrast, the digital conversion indicates many specific points/markers.

The conversion process involves techniques like pulse code modulation (PCM), perceptual coding, quantization, and compression.

When we compress audio files, it’s either through lossless or lossy compression.

We’ll go into more detail on each process below.

Many audiophiles suggest that CDs and digital audio formats sound harsh or cold compared to vinyl, which may be true.

However, the idea that vinyl records produce better sound quality because they hold the original audio waveform is redundant.

There’s an argument that you can’t perfectly reproduce the analog waveform digitally, but vinyl recording is by no means perfect either.

It’s OK to prefer vinyl for its character and nostalgia because those things are subjective.

But, in any objective/quantifiable way, digital formats produce consistently better sound quality.

To prove it, we have a huge library of high-quality, royalty-free music in just about every genre.

Anyway, that’s a debate for another article; we just wanted to give our two cents.

What is a WAV File?

A WAV file is a lossless audio format created by Microsoft and IBM in 1991 for use with the Windows operating system.

Lossless formats provide the highest audio quality because they capture the most accurate mathematical representation of an analog sine wave, capturing the entire perceived human hearing range from 20 Hz to 20 kHz.

For that reason, WAV files are the industry standard for professional audio work.

Although it started as a Windows extension, WAV files are universally supported, making cross-platform sharing easy.

A WAV file is a derivative of the RIFF format that stores data in tagged chunks (Resource Interchange File Format).

The image below shows the basic layout of a WAV file as taken from the original public specification published by Microsoft.

WAV or MP3 - An image of the basic WAV file format

We can see that the file has one main WAV chunk and two sub-chunks.

The main chunk header shows that the fill is in the RIFF format, and the RIFF Type ID shows it is the WAVE type.

The fmt (format) chunk specifies the format data, and the data chunk contains the actual sample data (raw uncompressed linear PCM values).

The fmt chunk will contain important metadata like:

  • Compression type
  • Number of channels
  • Sample rate
  • Bit rate
  • Encoding type

We can insert additional metadata like artist, album, and track names, but metadata isn’t just about convenience; we need certain information to tell the computer how to read/play the audio data (otherwise, it’s just a bunch of meaningless 1’s and 0’s).

WAV files are typically uncompressed, but even in that case, a numerical value representing uncompressed would be in the compression type field.

When we compress WAV files, we can do so through lossless compression, meaning we can reduce the file size somewhat without losing quality (more on that below).

WAV files are encoded through pulse code modulation (PCM) or linear pulse code modulation (LPCM).

The standard for CD production is WAV files encoded via stereo LPCM (two channels), sampled at 44.1 kHz/16-bit.

Nyquist Theory: The Nyquist Theory states that we can digitize an analog signal without aliasing if the sample rate is greater or equal to twice the signal’s highest frequency component.

A sample rate of 44.1 kHz provides a highest frequency response of 22.05 kHz, accurately covering the typical human hearing range.

An average three-minute song in the WAV format would require around 33MB of disk space.

WAV files support sample rates up to 192 kHz and bit depths up to 32-bit.

What is an MP3 File?

MP3 or MPEG Layer 3 is a lossy digital audio format developed by the Moving Picture Experts Group.

The Moving Picture Experts Group now works under the moniker Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI), focusing on data coding and intellectual property rights licensing in line with the latest AI technology.

The primary goal behind the development of MP3 was to replicate CD-quality audio in much smaller files with no perceivable difference.

MP3 is one of the most popular formats for streaming and sharing digital audio files.

WAV or MP3 - An image of the basic MP3 format

In contrast to a typical WAV file, MP3 files are compressed, and some audio information is lost in the process.

Converting to MP3 typically compresses an audio file by around 90%.

For example, MP3 conversion will reduce a 33 MB audio file to approximately 3 MB, meaning listeners can store significantly more music on their playback devices.

MP3 files have three defining elements: conversion speed, size, and quality.

Unlike WAV files, there is no bit depth, only a bit rate (kbps – the amount of data per second of audio) to determine the conversion speed.

The more data per second of audio (higher bit rate), the slower the conversion time.

The faster the conversion, the smaller the file size and the lower the quality.

You can see there is a clear trade-off regarding size and quality.

MP3 files typically range from 96 kbps (fastest conversion) to 320 kbps (slowest conversion).

However, the lowest quality you can expect on any free streaming platform should be 128 kbps.

To provide a real-world context, Spotify Premium offers streaming at 320 kbps; while we spend much time discussing MP3 as a reduced-quality format, it remains a mainstream standard in 2024 and provides an excellent listening experience.

MP3 files utilize a process called perceptual coding based on psychoacoustic models.

Encoding and A/D Conversion

As mentioned, for our digital devices to read and playback audio files, they use A/D converters to create a digital representation of the analog sound wave.

We’ve used a few terms above that require further explanation, like lossless/lossy, PCM, perceptual coding, sample rate, bit rate, and bit depth.

Now that we know what WAV and MP3 files are, let’s go through those terms to understand better how WAV and MP3 files are created.

Pulse Code Modulation (PCM)

PCM is a technique developed in the 1970s that we use to create digital representations of analog signals.

It does so by sampling the amplitude of an analog signal at uniform intervals.

The sample rate, commonly 44.1 kHz, determines the number of audio samples taken per second.

Think of it like the frame rate of a video recording; the more frames per second, the clearer and smoother the video.

In this case, the higher the sample rate, the closer we get to the original analog sine wave (higher sample rate = higher fidelity).

Each sample is quantized to the nearest value within defined digital steps.

The image we looked at earlier showed how a digital representation of an analog signal might look; now, we can look a little deeper.

WAV or MP3 - An image showing sample rate and bit-depth diagrams

Quantization divides the number of possible amplitude values into a finite level of discrete values.

As you can see, the PCM data is the closest mathematical representation of the analog signal (according to the sample rate and bit depth).

If the sample rate dictates how many samples are taken per second, you can think of the bit depth as the size of each sample.

The CD standard bit depth (16-bit) provides 65,536 possible amplitude values.

A higher bit depth offers higher amplitude resolution.

The quantized values are encoded into binary code, creating a PCM stream.

One final element worth mentioning is that the left and right audio samples are interleaved in a stereo track with two channels.

WAV or MP3 - An image showing a stereo PCM stream

Linear pulse code modulation (LPCM) is a style of PCM in which the quantization levels are linearly uniform rather than varied as a function of amplitude.

LPCM is the standard for CD production, but it’s common for people to refer to it simply as PCM in conversation.

Perceptual Coding

In contrast to PCM, a process that aims to recreate a waveform as accurately as possible, perceptual coding focuses solely on the information you need.

In other words, it discards anything deemed inaudible based on psychoacoustic models.

Our ears and brains interpret sound in a way that doesn’t perfectly mirror the outside world.

How we perceive sound, particularly amplitude, depends on several elements, like the frequency content and surrounding background noise.

WAV or MP3 - An image of two graphs describing audio masking

You can see in the image above that a tone within the audible range of human hearing becomes inaudible because surrounding tones mask it.

The argument for this kind of encoding is, why include anything we can’t hear anyway?

In principle, we can get the same listening experience using a fraction of the disk space.

In practice, it’s imperfect by nature, and this kind of lossy encoding will never be used for professional audio work.

Data Compression

Data compression, in this case, is when we reduce the size of a file, usually at the expense of the sound quality.

Some audio files are uncompressed, providing the highest possible sound quality and the largest file size.

Compressed files fall into two categories: lossy and lossless.

Lossy compression is the most aggressive type; it uses lossy encoding algorithms, like perceptual coding, to remove inaudible audio information.

The upside of lossy compression is that it can reduce the file size by around 90%.

The downside is that sound quality gets lower as more information is removed; unlike a ZIP file, which, when opened, returns to its original state, you cannot recover any information removed during lossy compression.

Lossless compression is the almost perfect solution; it reduces the file size without degrading the quality.

Like a ZIP file, lossless files are compressed for storage and transferring, but they are decoded upon playback to produce the original quality.

We call it an almost perfect solution because the file size reduction is not nearly as significant as lossy compression, meaning it’s still not ideal when disk space is tight.

The Ups and Downs of Each Format

Each format has certain advantages and disadvantages; let’s run through them.

Advantages of MP3

Small File Format

The biggest advantage of MP3, by a long way, is the small file size.

The small file size makes it easy to distribute, transfer, and store far more music on your devices.

The image below shows the massive difference (on average) in required storage space for MP3 versus WAV files.

WAV or MP3 - An image showing audio file storage statistics

No Perceivable Difference in Quality

Although MP3 files can’t literally match WAV files in terms of sound quality, the average music fan won’t hear any difference in general listening.

For fun, check out this audio quality test from NPR; don’t worry if you get a few wrong answers; even the best ears do.

Easy Conversion

Many online WAV to MP3 converters, like XLD for macOS and Exact Audio Copy for Windows, are available.

Disadvantages of MP3

Degraded Quality

There’s no escaping it: smaller files come at a cost.

Loss of Bandwidth

Different encoders have different algorithms, meaning different platforms often produce different results, even with the same settings.

A bit rate of 128 kbps or less won’t cut it anymore despite being the standard for platforms like iTunes, etc. previously.

At 128 kbps, MP3 converters filter the higher frequencies very crudely, discarding frequency content anywhere above approx. 16 kHz.

To maintain full bandwidth through the iTunes MP3 encoder (and similar platforms), you must have a bit rate of 256 kbps or higher.

Pre and Post-Echoes

Pre/Post echoes are common artifacts of lossy compression.

A pre-echo is when you hear a sound before it occurs.

A post-echo is when you hear a sound after it occurs, but we don’t hear post-echoes often because forward temporal masking is far stronger than backward temporal masking.

It’s most prominent with percussive sounds but possible with any short bursts of noise.

The echoes are caused by quantization noise spread over the entire transform window of the codec.

In Temporal masking, as shown earlier, even if the quiet one happens first, it will be masked by the louder one if there is only a small interval between the two.

It’s a problem that can occur commonly, even at higher bitrates like 256 kbps. 

Converters can use filters to introduce phase distortion and temporal smearing to render any pre-echoes inaudible, but it’s still a problem to consider.

Double Track Effect

Lower bit rates sometimes cause audio content timing errors.

This problem is most noticeably heard on vocals, creating the illusion of the voice being double-tracked.

Dynamics and Phase Shift

By removing certain frequency content, perceptual coding can change our perception of the remaining frequency content.

You can end up with a very inconsistent dynamic range.

Some sounds may seem attenuated, making others sound like they have been boosted.

The relative phase or timing of frequency content can be changed, affecting stereo imaging or even the transparency and clarity of the material.

When frequency content is stretched over time, like pre-echoes and post-echoes, it can play havoc with the listeners’ perception of the audio.

Weak Low End

Although digital audio generally provides a wider dynamic range than analog formats, MP3 files have a reputation for a weak low end.

Lower frequencies are far more difficult for DSP (Digital Signal Processing) algorithms to analyze.

Lower frequencies have a longer duration, while the analysis windows are short.

Analysis windows won’t usually capture an entire cycle of a low frequency.

In many cases, the encoder will get less than half a cycle of any frequency under 114 Hz.

Not Suitable for Professional Work

The lower quality and potential lossy compression artifacts mean MP3 isn’t suitable for professional audio work.

Advantages of WAV

Retains full Quality

It is an accurate, lossless format; the quality remains the same as the original recording.

Not only is our extensive collection of royalty-free music high-quality (at least 16 bit/44,1 kHz), it also features composers who have worked for industry-leading clients, like FOX, Disney, and Sony.

Simplicity

Files are easy to edit and process with user-friendly software, from freeware to professional applications.

Advancements in Home Recording

Many popular home studio audio interfaces can now offer recording rates up to 192 kHz.

WAV is the perfect format for this high quality and huge dynamic range.

Disadvantages of WAV

Large File Size

The large size makes WAV files very impractical for portable devices and streaming.

Quality vs. Size

There are several advantages to using either format, but the debate of which is best will always come down to quality versus size.

If quality is the priority, for example, if you’re:

  • adding music to a film/video
  • releasing a CD
  • sharing files for professional work (media scoring, etc.)
  • selling songs or samples online

WAV files are always the best choice.

If you’re storing or sharing music for general listening, MP3 is the most convenient option.

Remember, while you can convert a WAV file to MP3, converting an MP3 to WAV won’t improve the quality in any way.

Alternative Formats

FLAC – Free lossless audio codec (lossless)

FLAC files are lossless; you can compress them to around half the original audio file size.

AIFF – Audio interchange file format (lossless)

Audio data in AIFF files is uncompressed PCM (the Macintosh version of a WAV file).

ALAC – Apple Lossless Audio Codec (lossless)

ALAC files are similar to FLAC files and compress to around 60% of the original size with no change in quality.

ALAC files use the extension .m4a, mainly used with iOS devices.

OGG – Ogg Vorbis (lossy)

OGG files are encoded using a variable bit rate system and are used by some music streaming platforms.

AAC – Advanced Audio Coding (lossy)

Although AAC files are lossy, they offer better quality than MP3 ( sample rates up to 96 kHz), making them the choice for many streaming platforms.

WMA – Windows Media Audio (lossy)

The Windows version of an MP3 offers higher quality (similar to AAC), but the format isn’t as widely supported as others.

A Checkered History of the MP3

Karlheinz Brandenburg, a professor at the Fraunhofer Institute, was one of the lead developers of the MP3.

He was also one of the pioneers of psychoacoustics in professional audio.

By the late 1980s, the MP3 was almost ready but still having issues dealing with the human voice.

The song Tom’s Diner by Suzanne Vega is a common choice amongst audiophiles for testing sound systems.

An acapella version of Tom’s Diner was the first track chosen to test the MP3 format.

Initially, MP3 compression destroyed the track, leading to hundreds of revisions.

moDernisT is a project by Ryan McGuire, who created a track based on the discarded/leftover sounds from Tom’s Diner after compression.

 

The MP3 format has played a leading role in some of the music industry’s most celebrated and controversial movements.

The release of the Winamp media player for Windows in 1997 brought MP3 to the masses; in the late 1990s, I think everyone with a PC created Winamp playlists full of MP3s.

In the Winamp era, we realized how easy it was to share music with friends and family.

Inevitably, this realization prompted the emergence of illegal peer-to-peer file-sharing platforms.

By 1999, Napster was taking its place as the king of the P2P platforms, a position that came with countless lawsuits.

One year before Napster, in 1998, the release of the first portable MP3 player, the MPman, went relatively unnoticed.

However, a few years later, Apple released the first iPod, and how we listen to music changed forever.

These days, most of us listen to music on our smartphones daily; despite its flaws, the MP3 is still a massive part of our lives.

 

Related Articles

by
00:00 / 00:00
Metronome
$0