Quantization noise & bit-depth
There is no point in discussing sample rate and bit-depth without having a real experience of “lo-res” audio. Studying audio with reduced sample rate or bit-depth may help us better understand the relationship between audio fidelity and relevant parameters.
Dec. 20, 2021
Fortunately, the investigation of bit-depth and quantization doesn't require complicated and tiring ABX tests. In fact, it only takes a few minutes. Quantization and word-length reduction adds only noise to the original signal, which is called quantization noise. The only difference between different bit-depths is that higher bit-depth offers lower noise floor. Therefore, we just have to listen to the quantization noise at a high volume setting in a quiet place. If we can’t hear the quantization noise at X bits, then we won’t hear it in music and increasing the number of bits will not improve the playback fidelity. That's all. Testing quantization (optimal dithered quantization) and bit-depth with music samples is pointless (unless we want to test how quantization noise is masked by the music at low bit-depths).
This quantization noise audibility test lets you to check the audibility of quantization noise at different bit-depth settings directly from your browser. High quality headphones with head-on response close to the Harman target curve or loudspeakers are recommended. The built-in speakers of laptops and smartphones are not suitable for listening tests.
(2022-11-20: mimeType checking fixed, works in Google Chrome again.)
"What should I hear?" - First, let’s look at the case when the noise spectrum is white ('no noise shaping'). With 12 bits the quantization noise should be clearly audible. You should also hear the quantization noise with 14 bits, but in this case the noise has to be very quiet. When 16-bit is selected and the gain set that peak SPL at the listening position is lower than 105 dBSPL (typical value for a 'loud but not ear bleeding' playback level) then the quantization noise should be inaudible. Noise shaping lowers the audibility of the noise floor by 18 decibels, which means that the system gain can be turned up 18 decibels higher compared to the normal version. 16 bits have enough headroom from the quietest sounds to the roar of a T-Rex in a cinema.
This audibility test can demonstrate:
- How dither sounds.
- Difference between 'flat' and shaped dither. How noise shaping can lower the loudness of the quantization noise and increase the subjective dynamic range.
- How loudness of quantization noise changes as a function of bit-depth or system gain. This test clearly demonstrates that bit-depth of 16 bits is more than enough in consumer delivery formats.
- This test also shows that noise floor of 14-bit can only be heard under extreme - almost unrealistic - conditions.
Quantization, dither and noise shaping
Quantization is the process of converting a sampled analog value to a discrete value (a binary number). There is another type of quantization, which is called requantization: changing the word-length (bit-depth) of the digital values. Word-length reduction (e.g. 24 bit to 16 bit conversion) is a type of requantization.
Dither is a low level noise added to an audio signal prior to word-length reduction (e.g. 24 bit to 16 bit conversion). Dither improves the dynamic range and low-level linearity of a digital system. A digital system with the 'right dither' behaves like an analog system with a well-defined noise floor.
In software dither is created using a random number generator. The output of the random number generator is scaled and added to the original signal before truncation. Software implementation of dither consists of only a few lines of code.
Dither in Wikipedia ➚
The most common type of dither. TPDF stands for triangular probability density function.
Perceptual noise shaping
Perceptual noise shaping shifts noise from our most sensitive hearing range (roughly 2 to 6 kHz) above 15 kHz where noise is not audible. Noise shaping at 44.1 kHz sampling rate extends the subjective dynamic range with 18 decibels (3 bits). Quantization noise in 16 bit/44.1 kHz file created by noise shaping is subjectively equivalent to the quantization noise of 19 bit/44.1 kHz file created by standard dither (white noise spectrum dither).
Dynamic range calculations
When the word length (bit-depth) is reduced without dither and the signal doesn't contain any noise, then lowest signal level and dynamic range can be calculated with the well-known formula (DR = 6.02 × bits, and SNR = DR + 1.76). At 16 bits the level of the smallest signal is -96.32 dBFS. Everything below -96.32 dBFS will be converted to zero. However, 96.32 dB is only valid when a pure tone is created with an audio editor and converted to 16 bits without dither, but not with real-life content. The dynamic range of 16 bits is so high that noise in the recordings - even in modern digital recordings - can function as a dither.
The well-known formula does not apply to dither and noise shaping either. When dither is used, the SNR is lowered, but the 'subjective dynamic range' is increased and higher than 6.02 * bits. Digital systems are limited by quantization noise and the real question is the loudness of this noise. The "noise-free" dynamic range of a noise limited system can be expressed with the following procedure: the system gain must be set to the highest value at which the quantization noise is still not audible; in this case the SPL of the full scale sinusoid (0 dBFS) can be used to describe the dynamic range of the system.
What are 12 bits enough for?
(added on 2022.12.22.)
It's well known that 8 bits is enough for heavily compressed pop or rock music. However, it is not so well known that even 12 bits can provide sufficiently low noise floor for jazz, blues or any kind of music with acoustic instruments. Analog recordings are a living example of that. Recording noise in analog recordings is equivalent with the noise floor of a 12-bit...13-bit digital system (without noise shaping, just TPDF dither!). We may only hear some background hiss at the beginning and at the end of the songs or during very quiet passages. Rest of the noise is masked by the music.
Calculated bit-depth of some popular recordings:
|The Dave Brubeck Quartet - Take Five||1959||2009||26.3||11.6|
|The Dave Brubeck Quartet - Take Five||1959||2011||35.8||10|
|Led Zeppelin - Stairway To Heaven1||1971||2014||27.9||11.4|
|Dire Straits - Water of Love||1978||1995||24.7||11.9|
|The Blues Brothers - She Caught The Katy||1980||1995||21.2||12.5|
|The Blues Brothers - Sweet Home Chicago||1980||1995||27.2||11.5|
|Hans Zimmer - Gladiator Soundtrack2||2000||2000||13.7||13.7|
|16bit/44.1kHz TPDF dither (reference)||-||-||0||16|
1 1990 version is shorter, there is a fade out at the end
2 this is digital recording
Calculation of bit-depth: the noise floor (recording noise) is compared to the noise floor of a 16 bit / 44,1 kHz digital system with TPDF dither. The bit-depth is calculated by converting the difference from decibels to bits (dBre is the difference).
Recording noise measurement graphs can be found here.
Even 12 bits can provide excellent playback fidelity...
Noise perception, detection threshold & dynamic range
High-resolution audio vs. 16 bit / 44.1kHz