Noise perception, detection threshold & dynamic range
What is the "ear-perceived" dynamic range of 16bit/44.1kHz and some older formats? Why is the typical answer ("dynamic range of 16 bits = 96 decibels") wrong? Why doesn't Signal-to-Noise Ratio and "dBA" tell the whole story?
Jan. 17, 2023
Measurement of dynamic range requires a definite maximum and minimum value. However, amplifiers, DACs and audio formats are noise limited systems, which means that the transmission of low level signals is affected by the noise in the system. There is no lowest signal level just more and more noise. On the other side, the interpretation of the peak value in a multi-channel system depends on how the channels are treated, how the signals are added (correlated vs. non-correlated summation, how many channels are involved in creating the total sound pressure).
If we analyze the ear's dynamic range, then we can find a well-defined lower limit for pure tones (the Absolute Threshold Of Hearing, ATH). However, the ATH curve only shows the detection threshold for pure tones and the detection threshold for noise has to be calculated in a different way. Also, the maximum SPL for the ear is not as exact as a clipping threshold in an amplifier.
Since systems are noise limited, it's not surprise that the lower bound is somehow related to noise perception. Noise perception along with the ear's transfer function determines the lower limit, the audibility of noise.
First, let's examine the two most frequently used dynamic range measurement methods and their limitations.
Traditional measurements of dynamic range
1. Formula for digital systems (6.02 × bits)
In a digital system when the word length (bit-depth) is reduced without dither and the signal doesn't contain any noise, then lowest signal level and dynamic range can be calculated with the well-known formula (DR = 6.02 × bits, and SNR = DR + 1.76). At 16 bits the level of the smallest signal is -96.32 dBFS. Everything below -96.32 dBFS will be converted to zero. However, 96.32 dB is only valid when a pure tone is created with a software and converted to 16 bits without dither, but not with real-life content. The dynamic range of 16 bits is so high that noise in the recordings - even in modern digital recordings - can function as a dither.
2. Signal-to-noise ratio (SNR)
Signal-to-noise ratio is the most widely used measurement. It's simple, it gives acceptable results with white spectrum noise if we tolerate a near 10 dB error.
Signal-to-noise ratio is defined as the RMS of the peak sinusoidal signal divided by the RMS of the total noise. For audio measurements the measurement of noise RMS is band limited at 20 kHz and the noise is usually A-weighted. However, loudness of the noise doesn't correlate with the RMS of the total noise, because noise energy is distributed between the inner ear's filters (the inner ear behaves like a set of closely spaced filters with bandwidths roughly proportional to their center frequencies). Total RMS overestimates the noise relative to the ear, therefore SNR underestimates dynamic range relative to the ear. The other problem is that A-weighting gives a crude approximation of the ear's sensitivity.
SNR of 96 dBA doesn't mean that the dynamic range is 96 decibels. It means that a result of a measurement is 96 decibels, but this is not the dynamic range.
In short: SNR measures the noise RMS from 20 Hz to 20 kHz, the ear measures the noise in small (approx. 1/9 octave) segments. The incoming signal is weighted with the outer-ear transfer function. In addition to that there is a frequency selective amplification of quiet sounds, which is responsible for the shape of loudness curves below 1 kHz.
3. Using a reference system
If we don't want to deal with perceptual questions then we can use a simple and effective 'comparative' method. We choose a reference system, measure the noise spectrum and compare all other noise spectra to the reference. Noise floor of 16bit/44.kHz digital system with TPDF dither is a good reference, since it's easy to reproduce and it has a flat frequency spectrum.
Dynamic range of a noise limited system
"Noise-free" ear-perceived dynamic range of a noise limited system is defined in the following way (as in footnote #2):
- The system gain is set to the highest value at which the noise (quantization noise) is still not audible;
- The SPL of the full scale sine (0 dBFS) can be used to describe the dynamic range of the system.
A simple model for noise perception
Audibility of noise is more important than total RMS or signal-to-noise ratio. It can be calculated using the following method.
- The audible frequency range is divided into segments (critical bands). (Width of a critical band is the Equivalent Rectangular Bandwidth (ERB). ERB is the noise bandwidth of human hearing as a function of frequency. Above 1 kHz the width of an 'ERB band' is about 1/9 octave.)
- Noise level (RMS) is calculated in each band.
- Values are weighted with inverse ATH (or an ear transfer function measured at low SPL). The weighted curve is the noise audibility threshold.
The "noise-free" subjective dynamic range can be expressed with the absolute value of the maximum weighted noise level. Or more simply: the "noise-free" subjective dynamic range is defined by the top of the noise audibility curve.
The graph shows the noise audibility curve of a 16bit/44.1 kHz digital system with TPDF dither. The blue curve represents the critical-band noise level vs. frequency (width of the band = ERB). The blue curve also represents a single tone curve with the same loudness as the quantization noise. The red curve (the weighted curve) has several meanings too. First of all, it represents the noise audibility of 16bit/44.1 kHz. Then it can be interpreted as a segmented signal-to-noise ratio vs. frequency. (Since 0 dBFS is the RMS of the largest sine wave, the absolute value of weighted noise RMS is the SNR in that band). Inverted version of the ISO 226-2003 threshold in quiet was choosen for the weighting function (Equal-loudness level contours - ISO 226-2003 at sengpielaudio).
The red curve is also the audibility curve of a system with 96 dBA SNR and white spectrum noise floor. (The SNR of 16bit/44.1 kHz with TPDF dither is 96 dBA (93.3 dB without A-weighting). )
The following graph shows the noise audibility threshold of the most popular audio formats. In the case of vinyl "HQ" means high quality pressing and new LP, "LQ" means low quality pressing or worn-out LP.
Peak levels in audio systems
In a typical home stereo system (amplifier: 2x100 Watt, speaker: 88dB/W from one meter) the peak level is 114 decibels from one meter. In the listening position the peak level is about 105-108 decibels. In a stereo system calibrated to the 'EBU standard' (broadcasting standard adopted by recording studios), the level of a sine wave with the largest amplitude is ~107 dB (101 dB per speaker). However, even a peak level of 105 dB is too loud and unpleasant for most people.
An alternative interpretation of the noise audibility threshold
The noise audibility curve (the ear-weighted curve) shows not only the audibility of noise, but the possible perception of pure tones - the dynamic range for pure tones! The so-called noise masking tone threshold is 3 decibels in a critical band. It means that if we generate a 60 dBSPL narrow-band noise with a bandwidth of a critical band and we generate a pure tone with a frequency set to the center frequency of the noise, then pure tones below 57 dBSPL will be masked by the noise.
If we look at the 16/44.1k noise audibility curve, then we can see that the noise power / noise level is approx. -118 dBFS at 1 kHz. The "noise-free" dynamic range is 118 dB at 1 kHz, but we can hear tones in this range between 0 dBFS and -121 dBFS (-118-3). The dynamic range for a pure tone is 121 dB at 1 kHz.
(It follows, that the "ear has an amazingly good ability to hear tones in noise" is just a myth... If there is a large mount of noise in a recording, then we can only hear those harmonics that are not masked by the noise. )
How does the number of audio channels affect dynamic range?
Increasing the number of audio channels increases the noise floor. The total noise is the "power sum" of the channels' noise level (doubling the number of channel results in 3 dB increment in the noise floor). There aren't too many options here. However, the overall sound pressure can be calculated in several ways, so the dynamic range depends on the calculation of the peak SPL.
Stereo "case study":
- The peak SPL is the sum of the left and right channels' peak SPL, left and right signals are correlated, amplitudes are doubled. SPL_{2ch} = 6 dB + SPL_{1ch}. (Example: stereo channels contain identical or almost identical audio tracks)
- The peak SPL is the sum of the left and right channels' peak SPL, left and right signals are not correlated, powers are doubled. SPL_{2ch} = 3 dB + SPL_{1ch}. (Example: stereo tracks contain different tracks, this is the best model for an orchestra)
- The peak SPL is the peak SPL of one channel. (Example: loud drum in the left channel, right channel contains no signal, not common)
In the first case, dynamic range (in any interpretation) of a stereo system is 3 dB higher compared to mono system. In the second case ,the SNRs are identical. In the third case SNR of a stereo system is 3 dB lower compared to mono. Second case is the best model for an orchestra...
However, if we follow the standard stereo / mono conversion (Mono = 0.5*Left + 0.5*Right), then a stereo system has extra 3dB dynamic range compared to mono, because side signals are mapped 6 dB higher in stereo channels, mono (mid) signals have the same level, but are summed acoustically (6 dB increment in SPL) and the noise floor is only 3 dB higher.
Alternative calculation (with peak SPL)
We can illustrate the audibility of noise on the SPL scale. This method was published in this study from 1994. Results (noise-free dynamic range, segmented SNR values) are the same.
Simulation software
Web based application: Noise Perception Modeler.
Csaba Horváth
More articles in this topic:
Listening test: Quantization noise & bit-depth
Footnote #1 (noise audibility in audio systems):
‘‘Minimally Audible Noise Shaping‘‘, S. P. Lipshitz, J. Vanderkooy, and R. A. Wannamaker, 1991
‘‘Noise: Methods for Estimating Detectability and Threshold‘‘, R. Stuart, 1994
Footnote #2 (critical bands, loudness of band-pass noise, threshold of complex tones):
‘Psychoacoustics - Facts And Models‘‘, Zwicker, Fastl, 2007 (chapter 6: Critical Bands and Excitation; chapter 8: Loudness)
‘‘An Introduction to the Psychology of Hearing‘‘, Brian C.J. Moore, 2013 (chapter 3: Frequency Selectivity, Masking and the Critical Band; chapter 4: The Perception of Loudness)
Footnote #3 - ERB vs. critical band summation:
ERB scale is finer therefore the calculated noise levels are slightly lower with ERB bands than with the "older" critical bands. Fortunately, the difference between the two scales is very small in the most sensitive region of hearing (between 1 kHz and 10 kHz). Using the "old" (Zwicker) critical bands the noise power is one decibel higher at 1 kHz, two decibels higher at 4 kHz and three decibels higher at 10 kHz.
Glossary
Critical band & critical bandwidth
The frequency selectivity of our hearing system can be approximated by subdividing the intensity of the sound into parts that fall into critical bands.
Level of a pure tone judged as loud as a band-pass noise with the same level up to a certain bandwidth. This bandwidth is called critical bandwidth.
Decibels relative to full-scale (dBFS)
Decibels relative to full-scale (dBFS) is a unit of measurement for amplitude levels in digital systems. The level of 0 dBFS is assigned to the RMS value of the full scale sine wave. All level values smaller than the maximum are negative.
dBFS in Wikipedia ^{➚}
Dither
Dither is a low level noise added to an audio signal prior to word-length reduction (e.g. 24 bit to 16 bit conversion). Dither improves the dynamic range and low-level linearity of a digital system. A digital system with the 'right dither' behaves like an analog system with a well-defined noise floor.
Dither in Wikipedia ^{➚}
TPDF dither
The most common type of dither. TPDF stands for triangular probability density function.