Bit-depth and playback fidelity
Dec. 6, 2020
Many music streaming and download services offer recordings in 24-bit/96 kHz formats and promote high-resolution music as something really exciting, something that takes the musical experience to a higher level. Unfortunately the marketing claims about the bit-depth and sampling rate are misleading and there is nothing wrong with distributing music in 16-bit/44.1 kHz FLAC format.
There are many reasons why 24 bit makes no sense in a distribution format. First, a 16 bit / 44.1 kHz recording with noise shaping technology can cover 120 decibel dynamic range - that's more than enough (in fact, this is 10 decibels more than what is required for listening to music very loud, but not uncomfortably and painfully loud) However, even without noise shaping the maximum dynamic range of 16bit / 44.1 kHz audio is huge and higher than the well-known and wrong '96 dB': when the gain set so that the loudest peak is ~103 decibels the noise floor is just on the threshold of audibility in a two-channel system.
In addition, modern recordings can be converted not only to 16 bits without compromising playback fidelity (without increasing the loudness of the noise floor), but the majority of the recordings can be converted to 14 or even 13 bits without any loss of quality. 16 bit audio has a huge dynamic range, so huge that it is too large for many music. It’s hard to find a recording and a listening situation that takes advantage of noise shaping at 16-bit resolution. (In a multichannel system due to the higher number of sound sources the noise level is a bit higher and noise shaping plays a more important role.)
In a 'powerful' home stereo system (amplifier: 2x100 Watt, speaker: 88dB/W/1m) the peak level is 114 decibels from 1 meter. In the listening position the peak level is about 105-108 decibels. In a stereo system calibrated to the 'EBU standard' (broadcasting standard adopted by recording studios), the level of a sine wave with the largest amplitude is ~107 dB (101 dB per speaker). Simply, people don't listen to music at home so loud when the peaks are higher than 110 decibels. (The 'normal loud' is about 90-95 dB peak.)
Quantization, dither, noise shaping
Quantization is the process of converting an analog value to a discrete value (a binary number), usually a sampled signal in an analog-to-digital converter (ADC). In a modern recording scheme, there is another type of quantization, which is called requantization: changing the word-length (bit-depth) of the digital values. Requantization is completely digital, implemented in software and all the digital tricks are available.
Quantization and the type of requantization when the number of bits are lowered is lossy. The nature of the error signal (the difference between the original and the quantized sample) depends on the quantization method. When the samples are simply truncated, then the error can be heard as non-linear distortion with low-level tones. However the normal and proper way of requantization is to either applying dither or using noise shaping.
Dither is a low level noise added to an audio signal prior to requantization or word-length reduction (e.g. 24 bit to 16 bit conversion). Dither improves the dynamic range and low-level linearity of a digital system. A digital system with the 'right dither' behaves like an analog system with a well-defined noise floor.
Noise shaping at 44.1 kHz sampling rate extends the subjective dynamic range by 18 decibels (3 bits). A 12 bit/44.1 kHz audio file created by noise shaping is subjectively equivalent with a 15 bit/44.1 kHz audio file created by dither with white noise spectrum.
Although the quantization is a lossy process, the quantization noise in a 16 bit/44.1 kHz recording falls below the absolute threshold of hearing up to playback levels with ~120 decibel peaks...
The test signal
Quantization errors are most easily heard with low-level pure tones. If you can't hear quantization with a low-level pure tone, then you won't hear in any music. The complex sound of real instruments masks quantization noise better than a pure tone with a single frequency.
Quantization, word-length reduction affects low-level signals more than high-level signals. Therefore, in a resolution test the musical peaks are not really important, except for setting the playback level. In this demo the test signal is a 500 Hz pure tone with an exponential decay. The peak level is -20 dBFS and the lowest level is -120 dBFS. On a calibrated stereo system the -20 dBFS should resolve to ~85 dBSPL and the 0 dBFS to 105 dBSPL.
In fact, when we apply constant dither or noise shaping, there is only one relevant question needs to be answered: can anyone hear the noise floor when no audio signal is present? If no one can hear it, then adding bits doesn't improve the playback fidelity.
Listen to the audio samples on headphones in a quiet environment. The built-in speakers of notebooks and phones are not suitable for these demos. The volume should be at maximum or near maximum (the samples are not loud), however, I suggest starting with a low volume.
The audio samples are 16 bit/44.1 kHz FLAC files. 12-bit and 14-bit files were converted to 16-bit before compression, because 12-bit and 14-bit audio is not supported on all devices.
Don't expect a musical experience from these tests. Test tones may not be the most exciting sounds, however, there are two reasons why test tones are better than music samples. First, it's easier to hear the difference in these samples than in music, on the other hand, if you can’t hear the quantization noise in a slowly fading tone at 16 bits, then you won’t hear it in any music and increasing the number of bits will not improve the playback fidelity.
1. Conversion method: truncation
When the original 32-bit file truncated to 12 bits, the distortion is clearly audible. In the 14-bit version the distortion is still audible, although it has been greatly reduced. In the 16-bit version the distortion is extremely low at the end of the fade out - dither will remove this small distortion.
These samples clearly demonstrate that quantization, word-length reduction affects low-level signals more than high-level signals.
2. Conversion method: dither
Dither removes the distortion and sets a constant noise level. The 12-bit version is noisy, but the 14-bit version sounds pretty clean - only a small hiss can be heard in the background. The noise floor of analog recordings that were captured and mixed with the most advanced analog technology in the 1980s and later is close to the noise of this 14-bit dither (excluding room noise). The 16-bit version should sound clean and noise-free.
3. Conversion method: noise shaping
In this section the 14-bit version has a higher dynamic range (lower noise floor), than the 16-bit sample in the previous section. The dynamic range of the 16-bit version approximately 120 decibels. Both the 14-bit and 16-bit audio samples should sound clean and free of noise.
Troubleshooting (no sound):
- Check the browser doesn't mute the volume (common in Google Chrome).
- If play button is disabled (greyed) try this: select a track from the playlist and double click on it.
- Check your master volume settings.
Note: you should hear the noise floor in the 14 bit audio sample created with dither.
Possible flaws in the test (when the test results are invalid)
Anything that lowers the perceived dynamic range lowers the validity of the test and the result will be so-called false negative:
- low volume (low amplifier gain)
- high environmental noise (fans, street noise)
- signal-to-noise ratio (SNR) of the DAC is less than 95 dB (older smartphones in the low price range have SNR of 85-90 dB, but notebooks typically have 95 dB or better)
Recording noise may contribute to false negative results also, however, these audio samples were created with an audio editor so the samples don't have recording noise.
More about quantization and conversion methods
When the word length (bit-depth) is reduced without dither and noise shaping then the lowest signal level and the dynamic range can be calculated with the well-known formula (DR = 6.02 * bits and SNR = DR + 1.76). At 16 bits the level of the smallest signal is -96.32 dBFS. Everything below -96.32 dBFS will be converted to zero.
However, the well-known formula does not apply to dither and noise shaping. The SNR is lower, but the 'subjective dynamic range' is higher than 6.02 * bits.
The loudness of the noise floor of a recording or audio file depends on these factors:
- The level of noise already present in the recording
The noise in a recording is the 'sum' of the quantization noise and the recording noise. If their levels are very different, then the noise floor is determined by the higher one. For example, the noise floor of analog recordings doesn't change when the 24-bit masters are converted to 16, 15 or even 14 bits by truncation.
An extra bit during quantization lowers the noise floor by 6 decibels and increases the dynamic range by the same value. In other words: lowering the bit-depth with one bit increases the noise floor by 6 decibels and decreases the dynamic range by 6 decibels. (Of course, this is valid if the noise floor of the original recording is lower than the noise added by quantization.)
- Sampling rate
Doubling the sampling rate lowers the noise floor by 3 dB for a white noise spectrum dither and noise shaping is easier at higher sampling rates. (Of course, this is valid if the noise floor of the original recording is lower than the quantization noise.)
- Type of the conversion method
Truncation, dither or noise shaping. Noise shaping at 44.1 kHz sampling rate extends the subjective dynamic range by 18 decibels (3 bits) compared to a non-shaped dither. However, noise shaping is only beneficial if the noise floor of the original recording is quieter than the shaped quantization noise.
- Level of the quietest 'sound' or passage
In a recording this is the recording noise or a combination of a long fade-out and the recording noise. In an audio test this can be the level of the signal or sound sample.
- Type of the test signal or sound in a demo, or the type of the fade out in a real recording
Complex tones mask the noise better than simple tones.
- Noise in the listening room
Noise in the listening room masks the low level sounds in the recording and lowers the overall dynamic range.
- Playback level
At low playback levels the SPL of low level tones, noise, etc. fall below the absolute threshold of hearing.
- Number of channels
As the speaker channels increase, the level of quantization noise increases slightly.
• 2021-01-28 - the 1 kHz slowly fading pure tone was changed to a 500 Hz slowly fading pure tone. 500Hz pure tone is more pleasant and natural.