Why 96 kHz and 192 kHz sampling rates don't make sense for music playback?
- Sampling, waveforms and frequency spectrum
- Filter nightmare
- Sampling and time resolution
Last edited: Jan. 16, 2019
(minor changes in 'Myth: Amplitude modulation (and other abnormalities) close to the Nyquist frequency' section)
There is an ongoing debate in the audio community about whether 96 kHz or 192 kHz sampling rates give any quality improvement over the audio CD standard (44.1 kHz/16 bit) for high fidelity music playback. Usually those, who support the higher sampling rate, don't understand or just simply misunderstand the sampling theory and have incomplete or elementary knowledge about digital filters.
It is quite obvious that High-Resolution Audio advertisements target those people who have no real experience in digital audio. These ads give a false impression about digital waveforms: if a waveform contains more sample points, then it looks nicer, and if it looks nicer than it sound better. So simple... But does a waveform sound better if it looks nicer? Sometimes it does, sometimes it doesn't... It depends... One thing is for sure: the laws of psychoacoustics (human hearing) and the laws of visual aesthetics are different or at least not interchangeable. (This will be discussed in detail in the following sections.)
Of course, 96 kHz or 192 kHz sampling rates and 32 bit resolution make sense during recording and mixing, but they are overkill for music playback...
For those who are not familiar with digital audio and do not know the terms, such as sampling frequency, quantization or bit depth, I recommend Digital Show & Tell video lecture at Xiph.Org website. Demonstrations of sampling, quantization, bit-depth, and dither on real audio equipment - from a real digital audio expert. 24/192 Music Downloads are very silly indeed is the best criticism of high resolution audio so far. Nevertheless, the article wasn't updated since 2012 and there are some false beliefs, misconceptions that live in the audio community about the higher sampling rates and weren't mentioned in Monty Montgomery's article. I will analyse these misconceptions and I will explain why they are wrong. I will not deal with the issues of analog-to-digital and digital-to-analog conversion like jitter and non-linearity, because they are not related to the subject.
Analog to digital conversion in a nutshell
The analog to digital conversion have three main stages: bandlimiting (filtering), sampling and quantization . Bandlimiting is a low-pass filtering: frequencies above half of the sampling frequency have to be removed from the signal to avoid aliasing distortion. Sampling is sampling the band limited signal at discrete time intervals. Technically, this is called pulse modulation, or pulse amplitude modulation. The last step is quantization, and with 16 bits (the CD standard) and with shaped dither the RMS noise level can be kept below -110 dB in the audible range.
And here comes the hard part to understand: sampling doesn't remove any information from the band limited analog signal. Sampling is a lossless process. The 'lossy' part is the low-pass filtering and quantization: any information is removed from the signal and any noise is added to the signal only during these phases. If the bit depth and the sampling frequency are high enough, the information lost (and the added noise) can be kept below the audible threshold. And this is the case with the 16 bit/ 44.1 kHz format.
The analog signal can be perfectly reconstructed below the Nyquist frequency (half of the sampling frequency) from the sampled data points if the sampled signal is band limited at the Nyquist frequency. Okay, this is the academic textbook version and not very useful for audio, because ignores the effect of the anti-alias filter, so here is a modified and more practical formula:
The analog signal can be perfectly reconstructed below the anti-alias filter's cut-off frequency from the sampled data points.
The limit of the perfect reconstruction is 20 kHz at a sampling rate of 44.1 kHz, and 22 kHz at a sampling rate of 48 kHz. Both values are higher than the top of the human hearing range (18 kHz). And the rule mentioned before applies not only to sine waves, but also to all kind of audio signals.
There is another myth, which is associated with sample rate conversion: the rate of the resampling has to be an integer value. Fortunately, that's not the case... Converting a wav file from 44.1 kHz to 48 kHz is completely lossless and legal. Any upsampling is lossless, because sampling itself is lossless... Just stay away from linear, cubic, spline and any kind of polynomial interpolation, because they don't work with audio. (tip: Audacity with the 'best' quality settings works fine...)
Myth: Amplitude modulation (and other abnormalities) close to the Nyquist frequency
If we increase the frequency of the sampled sine wave, we get less samples per period and the waveform looks more and more coarse. At a 44.1 kHz sampling rate between 10.25 kHz and 14.7 kHz we have less than 4 sample points per wave-period, above 14.7 kHz we have less than 3 samples per period.
According to this myth, above and below any whole division of the sample-rate we get beats. The higher the frequency (or the lower the available sample-points per wave-period) the higher the amplitude of the beat. From approx. 18 kHz on we got a beat (amplitude-modulation) of 100%. Because the stairstep representation of the PCM waveform looks 'ugly' close to the Nyquist frequency (Fs/2), some folks believe that the signal contains additional frequencies below the Nyquist-frequency (which is originally not present in the signal).
For spectral analysis a spectrum analyzer is a much better tool than an oscilloscope. Short-time FFT (optimal window length: 10-20 msec) will tell us everything about the waveform: the spectral components, and any kind of possible amplitude modulation.
What we will see? Due to the sampling new harmonics (image frequencies) are created. If the sampling is done according to the sampling theory, then these harmonics are above the Nyquist frequency and they can be filtered out. If the reconstruction filter is missing (as in the case of the output of unfiltered non-oversampling DACs), then these image frequencies may cause 'amplitude modulation like' effects in the waveform. In other words we can see some amplitude modulation in the waveform, but we can't hear it because the image frequencies are above 22.05 kHz.
For example, in the close neighborhood of 14700 Hz the 'amplitude modulation like' effect (in the waveform) is very strong. But in fact the sampled signal only contains the original and the images ('harmonics') above the Nyquist-frequency. For a 14760 Hz sine wave the first image frequency is at 29340 Hz (44100 - 14760), the second is at 58860 Hz (44100 + 14760) the third is at 73440 Hz (88200 - 14760) and so on... There is no audible amplitude modulation in the sampled waveform.
Sampled and unfiltered 14760 Hz sine wave (Fs: 44.1 kHz)
It's possible to reconstruct the original sine wave from the sampled waveform with a low pass filter
The FFT spectrum of a sampled and unfiltered 14760 Hz sine shows no problem
(green: original, blue: images due to sampling (they are inaudible), the dashed red line is the Nyquist frequency)
ADDED (01-16-2019): In oversampling DACs (or in other word in interpolating DACs, and today most of them fall in this category) after the oversampling the reconstruction filter removes the image frequencies above the Nyquist-frequency and we will get a nice 14700 Hz sine wave. There is no notable audible difference between the two types of DACs, except non-oversampling DACs require some equalization, a slight 3 dB frequency response boost between 10 kHz and 20 kHz, but this is not related to the subject.
If we increase the frequency of our sine wave, then we will see, that the original and the first image are getting closer and closer. Because real FIR filters have finite length and finite slope, above 20 kHz the reconstruction filter (interpolation filter in oversampling DACs) can't filter out the first image. As a result of this there is some amplitude modulation at very close to half the sampling rate, but fortunately this region is inaudible. For example, if we sample a 22049 Hz sine wave at 44100 Hz, then in the sampled sine wave the first image component will be at 22051 Hz (44100 - 22049). If we could hear this region, then we would hear the amplitude modulation. But we can't.
Actually, this strange behaviour at Fs/2 depends on the anti-aliasing filter too. Usually anti-aliasing filters have full attenuation at Fs/2, so 22049 Hz and 22051 Hz will be completely attenuated. However, some filters have only 6 dB attenuation at Fs/2 (these are the 0.45/0.55 filters) and they will not filter out this range.
Myth: At 44.1 kHz sampling rate the anti-alias and reconstruction filters produce audible ringing
That's true that steep filters produce high degree of ringing and have poor impulse response, but if the low pass filter's cut-off frequency is above 16 kHz, then the ringing will not be audible - which is the case with the normal audio CD-format. At a sampling rate of 44.1 kHz the frequency of the ringing of the anti-alias filter and the DAC reconstruction filter is between 20 kHz and 22 kHz (the real frequency depends on the cut-off frequency and order (slope)). In other word the ringing of these filters have ultrasonic components only.
The highest frequency where the ringing of a steep filter is still audible is always lower than the highest frequency that a human can hear. The reason for this the ringing has low amplitude (SPL) and the sound pressure level of the ringing will fall below the absolute threshold of hearing (ATH) curve above 16 kHz. So even if someone has very good ears and can hear up to 18 kHz then this person will not hear the ringing of any type of filter above 16 kHz.
Not only the ringing of the anti-alias (or resampling) filter at the sampling rate of 44.1 kHz is not audible, but the ringing can not be heard at lower sampling rates either (e.g. with 22.05 kHz or 16 kHz). The reason is that the impulse response of an optimal (and typical) anti-alias filter is about 6 millisec long regardless of the sample rate (from the first sample to the last sample), and the ringing is fortunately masked by the impulse.
The ringing of the anti-alias filter and the reconstruction filter at a sample rate of 44.1 kHz or 48 kHz is a non-existent problem...
In the audible range and below ~15 kHz linear phase filters with long impulse response may produce audible ringing. According to my headphone listening tests, if the impulse response of a linear phase low-pass filter is shorter than 6 msec, then the ringing in the impulse response is never audible, because the ringing is masked by the impulse itself.
Myth: Higher sampling rates have better time resolution
Fanatics of 192 kHz audio think that the time resolution of a 44.1 kHz audio file is the same as the sampling period (22.67 usec). So raising the sampling frequency improves time resolution. This is completely wrong, because the time resolution of a 44.1 kHz audio file is much finer than the sampling interval. Even a 0.1 usec time delay can be reconstructed from a 44.1 kHz audio file. Even ONE (!) degree of phase shift between two 19 kHz sine wave can be represented in a 44.1 kHz audio file. How is it possible?
Some audio editors (like the old Cool Edit 2000 here) can display the reconstructed waveform from the sampled data points - According to the fanatics of 192 kHz audio this is not possible! (10 kHz sine wave, sampling frequency: 44.1 kHz)
Because sampling has nothing to do with time resolution. A sampled signal without quantization has limited bandwidth and infinite time resolution. Just a quick reminder: the bandlimited analog signal can be perfectly reconstructed below the Nyquist frequency from the sampled data points. Adding quantization to the sampled data raises the noise floor (and reduces dynamic range). As long as the quantization noise is not heard, the reconstructed signal sounds the same as the analog signal.
In other words the time resolution of a 44.1 kHz audio file is the same as the time resolution of a 192 kHz audio file. 192 kHz sampling rate has only greater bandwidth.
I created a simple experiment that proves - without using the sampling theory - that there is no time resolution difference between a 44.1 kHz and a 192 kHz audio format. The experiment requires an audio editor only and some experience with waveform editing.
- Create a 16 bit/44.1 kHz mono audio track (or wav file) and generate a 19 kHz sine wave with 0 deg phase.
- Create a 16 bit/44.1 kHz mono audio track (or wav file) and generate a 19 kHz sine wave with 1 deg phase shift. This is a 0.146 usec delay!
- Invert the track that contains the sine wave with one deg phase shift.
- Add the two audio tracks. Let's call this file as diff-Normal.
- Repeat steps 1-4 with 192 kHz sampling rate. Let's call this difference file diff-HiRes.
- Convert diff-Normal to 192 kHz. This is a simple upsampling, no extra information added to the file. This step is optional - I do so as to avoid the problems with different FFT window sizes.
- Do an FFT analysis and compare the FFT spectrum and amplitudes of the resulting waveforms of diff-Normal and diff-HiRes.
Result: there is no difference between the two 'difference' files! It means that the phase difference between the sine waves are the same in the 44.1 kHz wav and in the 192 kHz wav file. (For -3 dBFS sine waves the result is a -38.2 dBFS sine wave)
Myth: Hi-Res audio contains more detail and sounds much closer to the original recording
Audio files with sampling rates higher than 44.1 kHz may contain more detail in the ultrasonic range only, above the human hearing. An audio file with 96/192 kHz sampling rate doesn't contain more detail in the audible range, so it can not sound closer to the original recording.
To sum it up, there is no audible difference between a 96/192 kHz audio and its downsampled 44.1 kHz version. Differences are in the ultrasonic range only, above the human hearing. It's hard to believe this nowadays when High-Resolution Audio is pushed so aggressively that 16 bit/44.1 kHz (or 48 kHz) audio still perfect for music playback (psychoacoustically shaped dither is required only for dynamic range 'extension').
Bad sound kills good music, but good sound is based on clever audio engineering concepts which involves the understanding of the human hearing - and not on false beliefs. 96 kHz or 192 kHz sampling rate and 24 or 32 bit depth make sense in recording studios, where the audio signal goes through a lot of signal processing, but not for music playback. Apart from the recording process and mixing, Hi-Res audio is just waste of memory and bandwidth.