Audio demo:
Bit-depth and playback fidelity

What is the optimal bit-depth (resolution) for music downloads? Is 24 bit better than 16 bit? Audio demo with fading tone; bit-depth, conversion method and playback fidelity.

Dec. 6, 2020

Many music streaming and download services offer recordings in 24-bit/96 kHz formats and promote high-resolution music as something really exciting, something that takes the musical experience to a higher level. Unfortunately the marketing claims about the bit-depth and sampling rate are misleading and there is nothing wrong with distributing music in 16-bit/44.1 kHz FLAC format.

There are many reasons why 24 bit makes no sense in a distribution format. First, a 16 bit / 44.1 kHz recording with noise shaping technology can cover 120 decibel dynamic range - that's more than enough (in fact, this is 10 decibels more than what is required for listening to music very loud, but not uncomfortably and painfully loud) However, even without noise shaping the maximum dynamic range of 16bit / 44.1 kHz audio is huge and higher than the well-known and wrong '96 dB': when the gain set so that the loudest peak is ~103 decibels the noise floor is just on the threshold of audibility in a two-channel system.

In addition, modern recordings can be converted not only to 16 bits without compromising playback fidelity (without increasing the loudness of the noise floor), but the majority of the recordings can be converted to 14 or even 13 bits without any loss of quality. 16 bit audio has a huge dynamic range, so huge that it is too large for many music. It’s hard to find a recording and a listening situation that takes advantage of noise shaping at 16-bit resolution. (In a multichannel system due to the higher number of sound sources the noise level is a bit higher and noise shaping plays a more important role.)

In a 'powerful' home stereo system (amplifier: 2x100 Watt, speaker: 88dB/W/1m) the peak level is 114 decibels from 1 meter. In the listening position the peak level is about 105-108 decibels. In a stereo system calibrated to the 'EBU standard' (broadcasting standard adopted by recording studios), the level of a sine wave with the largest amplitude is ~107 dB (101 dB per speaker). Simply, people don't listen to music at home so loud when the peaks are higher than 110 decibels. (The 'normal loud' is about 90-95 dB peak.)

Quantization, dither, noise shaping

Quantization is the process of converting an analog value to a discrete value (a binary number), usually a sampled signal in an analog-to-digital converter (ADC). In a modern recording scheme, there is another type of quantization, which is called requantization: changing the word-length (bit-depth) of the digital values. Requantization is completely digital, implemented in software and all the digital tricks are available.

Quantization and the type of requantization when the number of bits are lowered is lossy. The nature of the error signal (the difference between the original and the quantized sample) depends on the quantization method. When the samples are simply truncated, then the error can be heard as non-linear distortion with low-level tones. However the normal and proper way of requantization is to either applying dither or using noise shaping.

Dither is a low level noise added to an audio signal prior to requantization or word-length reduction (e.g. 24 bit to 16 bit conversion). Dither improves the dynamic range and low-level linearity of a digital system. A digital system with the 'right dither' behaves like an analog system with a well-defined noise floor.

Noise shaping at 44.1 kHz sampling rate extends the subjective dynamic range by 18 decibels (3 bits). A 12 bit/44.1 kHz audio file created by noise shaping is subjectively equivalent with a 15 bit/44.1 kHz audio file created by dither with white noise spectrum.

Although the quantization is a lossy process, the quantization noise in a 16 bit/44.1 kHz recording falls below the absolute threshold of hearing up to playback levels with ~120 decibel peaks...

The test signal

Quantization errors are most easily heard with low-level pure tones. If you can't hear quantization with a low-level pure tone, then you won't hear in any music. The complex sound of real instruments masks quantization noise better than a pure tone with a single frequency.

Quantization, word-length reduction affects low-level signals more than high-level signals. Therefore, in a resolution test the musical peaks are not really important, except for setting the playback level. In this demo the test signal is a 500 Hz pure tone with an exponential decay. The peak level is -20 dBFS and the lowest level is -120 dBFS. On a calibrated stereo system the -20 dBFS should resolve to ~85 dBSPL and the 0 dBFS to 105 dBSPL.

In fact, when we apply constant dither or noise shaping, there is only one relevant question needs to be answered: can anyone hear the noise floor when no audio signal is present? If no one can hear it, then adding bits doesn't improve the playback fidelity.

Recommended setup

Listen to the audio samples on headphones in a quiet environment. The built-in speakers of notebooks and phones are not suitable for these demos. The volume should be at maximum or near maximum (the samples are not loud), however, I suggest starting with a low volume.

Audio samples

The audio samples are 16 bit/44.1 kHz FLAC files. 12-bit and 14-bit files were converted to 16-bit before compression, because 12-bit and 14-bit audio is not supported on all devices.

Don't expect a musical experience from these tests. Test tones may not be the most exciting sounds, however, there are two reasons why test tones are better than music samples. First, it's easier to hear the difference in these samples than in music, on the other hand, if you can’t hear the quantization noise in a slowly fading tone at 16 bits, then you won’t hear it in any music and increasing the number of bits will not improve the playback fidelity.

1. Conversion method: truncation

When the original 32-bit file truncated to 12 bits, the distortion is clearly audible. In the 14-bit version the distortion is still audible, although it has been greatly reduced. In the 16-bit version the distortion is extremely low at the end of the fade out - dither will remove this small distortion.

These samples clearly demonstrate that quantization, word-length reduction affects low-level signals more than high-level signals.

2. Conversion method: dither

Dither removes the distortion and sets a constant noise level. The 12-bit version is noisy, but the 14-bit version sounds pretty clean - only a small hiss can be heard in the background. The noise floor of analog recordings that were captured and mixed with the most advanced analog technology in the 1980s and later is close to the noise of this 14-bit dither (excluding room noise). The 16-bit version should sound clean and noise-free.

3. Conversion method: noise shaping

In this section the 14-bit version has a higher dynamic range (lower noise floor), than the 16-bit sample in the previous section. The dynamic range of the 16-bit version approximately 120 decibels. Both the 14-bit and 16-bit audio samples should sound clean and free of noise.

Troubleshooting (no sound):

Note: you should hear the noise floor in the 14 bit audio sample created with dither.

Possible flaws in the test (when the test results are invalid)

Anything that lowers the perceived dynamic range lowers the validity of the test and the result will be so-called false negative:

Recording noise may contribute to false negative results also, however, these audio samples were created with an audio editor so the samples don't have recording noise.

More about quantization and conversion methods

When the word length (bit-depth) is reduced without dither and noise shaping then the lowest signal level and the dynamic range can be calculated with the well-known formula (DR = 6.02 * bits and SNR = DR + 1.76). At 16 bits the level of the smallest signal is -96.32 dBFS. Everything below -96.32 dBFS will be converted to zero.

However, the well-known formula does not apply to dither and noise shaping. The SNR is lower, but the 'subjective dynamic range' is higher than 6.02 * bits.

The loudness of the noise floor of a recording or audio file depends on these factors:

Csaba Horvath

Revision history:
  2021-01-28 - the 1 kHz slowly fading pure tone was changed to a 500 Hz slowly fading pure tone. 500Hz pure tone is more pleasant and natural.

Facebook    Google

More articles