Audio measurements & fidelity:
the ten basic rules


Principles, methods, misconceptions.


May 4, 2025
Last edited: 2026.02.16.

The validity of audio measurements are a subject of ongoing debate in audiophile blogs and forums. To outsiders and newbies, measurements seems quite chaotic, as it's difficult to see the connections between different disciplines and their different approaches. Without a compass it's easy to get lost in the noisy jungle of misconceptions, false analogies, unfounded claims and marketing frauds. The following article provides a brief overview of the most important ground rules covering the science and logic behind sound reproduction and audio measurement. I hope these principles not only dispels many misconceptions, but also shed new light on old knowledge.

In audio magazines and forums, it is very common to blur the line between subjective and objective, i.e. to claim that a characteristics or quality is subjective, when in fact it is not. If we try to characterize the components of the audio chain with the emotional reaction to music, we will certainly fool ourselves. The musical experience, the enjoyment of music is influenced by many factors besides audio fidelity - audio fidelity is only a part of the musical experience, although an important and determining component. (Problems usually start when the audible differences become minor, belief mess up judgment and the wire drama begins...)


1. Audio science is an applied science

Audio science (science of sound reproduction) is an applied science and not a fundamental science. However, as other applied fields it has roots in fundamental sciences. - Any claim about amplifiers, DACs, audio formats or loudspeakers is a claim about hearing, physics or signal behavior.


2. How much can we trust our hearing?

We cannot blindly trust our hearing or our vision, because sooner or later they will cheat us. We can give countless examples, from optical illusions, misread sentences to misheard song lyrics. Except in dangerous situations, we don't use the information from our senses directly, as we take into account our past experiences. To a certain extent perception is a learning process, which means that we learn how to use our senses correctly. What we cannot improve through learning are the limits of our perception.

Almost everyone can recall the sound of a piano or guitar without much difficulty, but testing subtle differences requires special methods. This also applies to vision, with the difference that testing images is a somewhat more straightforward process, since the analytical and focusing ability of our vision is more refined. Separating one part of an image from the rest is a simple task, but separating one part of the music from the rest - for example, the sound of an instrument in an orchestral music - can be extremely difficult. It's not surprising that comparing minor audible differences requires careful attention and special conditions. We need a method that eliminates the possibility of making a wrong judgment.

However, even if we can definitely hear a difference, that's not the end of the story. By testing with music we can only make claims about the sound and not about the technology or the real cause. Revealing the cause-effect requires additional test conditions. For example, we need to rule out the possibility that the difference is caused by an unknown (hidden) variable (the test is not false positive or false negative).

Key steps to do a proper listening test:

  1. audio samples should be short (maximum five seconds long),
  2. switching between the samples should be fast and silent (maximum two seconds),
  3. only one parameter should be changed at a time, otherwise the test is invalid,
  4. in addition to musical excerpts, test signals, solo instrumental recordings may be needed,
  5. the selection of test material is critical, as test tones affect the sensitivity of the test and the variance of the results,
  6. in some tests, digital filters can also be applied to improve the sensitivity of the test (e.g. testing audio formats),
  7. the tests should be designed in such a way that they not only provide results, but also to infer some kind of causal relationship.


3. Blind test is a solution to only one problem!

Our senses can fool us, especially when the perceived differences are minor. Blind test, or its most commonly used form, the ABX test, is a universal tool for eliminating self-deception arising from our expectations and unreliable perception (unreliable auditory memory, confirmation bias). Blind test doesn't provide a solution for how to design a test free of flaws, how to control all variables (how to avoid false positive results) and if the test is free of errors, correlation still doesn't imply causation.

Blind testing methodology is the most misunderstood topic in audio. Rejection of blind testing is absurd, on the other hand, searching for the truth with double blind tests and/or statistical methods is a hopeless and completely pointless endeavour. Testing with music only makes sense if we already have a well-founded knowledge of the system to be tested. Blind tests shouldn't be used as a primary method, only as a secondary method to support auditory measurements. Moreover, the exclusive use of tests can lead not only to misconceptions but also to superficial knowledge.

Key to science is the distinction between knowledge without comprehension (~test results) and knowledge with comprehension (~mechanism). Numerous scientific tools (surveys, correlation analysis, data science in general) can only contribute to the knowledge without comprehension at most. Understanding causal relationships requires special experiments.

Fortunately, our knowledge does not come from "listening tests", but from experiments based on measurement and modeling.


4. Why do we need measurements?

There are many reasons why we measure. The typical (and somewhat stereotypical) answer is that measurements are more accurate and reliable than our senses. This is true, but there are other reasons besides accuracy. The main arguments for measurements:

  1. Accuracy and reliability (objectivity) - with measurements we can greatly reduce human fallibility.
  2. In a sense, measurements are extensions of our senses, a window to the world. We can measure what we can't see or hear directly.
  3. Visual representation. We are visual thinkers, our vision and imagination evolved to successfully navigate in the three dimensional space, and to use and make tools. On the contrary, our hearing and auditory memory evolved for communication and warn us of danger. We need a visual representation of a process to understand the process. No wonder that graphs are very helpful.
  4. Measurements are a way to create verifiable models by providing non-trivial predictions and non-ambiguous results. A model with ambiguous output or trivial predictions can't be verified (actually rejected) and improved.
  5. We can find quantitative relationships in nature. Models and measurements reflect these relationships.

Moreover, the role of measurements in applied science and fundamental science is different. In applied science the main role of measurements is to support the design process and quality testing. In fundamental science the main role of measurements is to support model validation. Consider how much confusion arises from mixing these roles.


5. Key to measurements - the six questions

The logic behind audio measurements isn't that complicated: audio measurements are signal measurements, measuring only those characteristics that can affect what we hear. Audio fidelity of a component is determined objectively by measuring the change(s) in the signal as it passes through the component and comparing with the corresponding threshold(s).

Audio measurements are only useful if they correlate with perception. Unfortunately, some old-school measurement methods taken directly from electrical measurements with minor modifications correlate poorly with hearing (SNR, SINAD). However, understanding how we hear can be very helpful in interpreting less accurate measurements. If we know the limitations of SINAD and SNR, we will not fool ourselves with SINAD and SNR.

All problems related to audio measurements can be grouped into six categories (or six "key points"):

  1. What is the physical process behind the technology used? What does the system do with the signal? (This is the level of physical explanation, or in other words, the physical model. In digital systems, the question is what the mathematical calculations do with the signal.)
  2. How can we measure signal changes, signal attributes, model variables? (I don't mean a specific audio measurement, but completely general tools.)
  3. What is the auditory process behind the perception of the "transmission error" or characteristic, and how can we create meaningful audio measurements? What is the nature of the signal change that we want to measure? (three main auditory process: auditory masking, absolute threshold of hearing, loudness sensation)
  4. With which type of signal the error is the most audible? (we look for the signal representing the "worst-case")
  5. What are the individual differences between the thresholds? (mean, P90, P75, P10... )
  6. How do measurements, measurement signals and thresholds relate to the sound sources around us? (musical instruments, speech, dog barking, thunder, crickets chirping, etc.)

We can use these six-point framework for anything: quantization, resampling, lossy audio compression, nonlinear distortion, resonances, speaker cables, loudspeaker feet / spikes, etc...


6. Accuracy vs accuracy - two types of 'accuracy'

A huge advantage of measurements is that they are more accurate than our hearing. But what does this really mean? Unfortunately, life is full of ambiguous expressions... and accuracy is just one of them.

We must distinguish numerical accuracy from modeling accuracy (validity). All modern hardware and software provide exceptional numerical accuracy, but not all measurement method provide modeling accuracy. For example, SINAD/THD and SNR are based on an oversimplified model of human hearing. As a result, SINAD/THD measurements and comparisons done with SINAD/THD can be misleading if the measurement result is lower than ~70 dB (when distortion can be audible). SNR measurements are also problematic when noise spectra differ significantly. Without a hearing model, FFT measurements (frequency spectrum analysis) provide low modeling accuracy for noise and transients.

An audio measurement can be only considered accurate, if it has numerical accuracy and the model behind the measurement is also accurate.


7. What we can hear...

Psychoacoustics studies the human hearing with special test tones, creates hearing models and determines various thresholds. What we can hear is determined by the Absolute Thresholds of Hearing (ATH) and auditory masking. Masking means that in the neighborhood of a (loud) tone the hearing threshold is raised. Signals below the threshold are inaudible.

Not only the audibility of pure tones or compex tones, but even the audibility of nonlinear distortion, noise or resonances is related to masking and ATH. (In fact, there is a third mechanism: adaptation or compression, a shift in the non-masked threshold.)



8 "Purity" of the signal is irrelevant

It is not the purity of the signal that matters... In an audio system, the goal is to preserve and transmit the signal in such a way that accumulated errors cannot be heard or low enough to not affect playback fidelity.

The shape of the signal itself is also irrelevant (square wave response, impulse response). We can't assess fidelity by looking at the waveform.

This also means that the aversion to software resampling and the hype around bitperfect playback that is typical today is just another nonsense.


9. Audio measurements & fidelity - the main categories

(A short overview.)

Audio fidelity of a component is determined by frequency response, nonlinear distortion curves and noise (necessary for expressing dynamic range). Time domain measurements (impulse response, phase shift, group delay) are secondary, as DACs and amplifiers have negligible phase distortion in the audible range. In multi-way loudspeakers the phase distortion is orders of magnitude higher, but even this is not audible in the impulse response. Time domain measurements are essential only in room acoustics and differ significantly from phase/group delay measurements.

In amplifiers and DACs, crosstalk between channels can also be considered an important parameter, although it's very rare to find a system with audible crosstalk. Jitter (fluctuation of the clock signal) doesn't require separate measurements, as it manifests itself as nonlinear distortion.

Audio fidelity measurements describe to what extent an audio system can reproduce the original performance (live, electronic). Poor frequency response can change the timbre (certain harmonics are emphasized while others are attenuated). Distortion can also change the timbre (new harmonics are generated) and high level of distortion can hide low-level details. Noise can also hide low-level signals. Usually a system or component with poor measured performance will not sound good. Very poor audio fidelity can even be painful to listen to...

Audiophoolery from Ethan Winer is a great introduction to this topic.


10. Thresholds & "worst-case" thresholds

Just-noticeable distortion, just-noticeable noise, just-noticeable level difference varies according to the harmonic and temporal properties of the audio signal. There always exists a particular signal with the lowest threshold, which can be identified as "worst-case".

For example, nonlinear distortion is best heard with pure tones and two-tone tones. Detection is more difficult with complex time-varying signals and pulses. In contrast, time-related errors are more audible in pulses or pulse series. (Yes, some measurement signals are worst-case signals.)

The definition of "transparency" and transparent system is closely related to the concept of 'worst-case'. For example, transparency is reached for nonlinearity when distortion is not audible with worst-case type test signals. The only exception is the frequency response of loudspeakers and headphones, where the “worst-case” (just-noticeable level difference with pure tone) is a too strict criterion.

Audibility threshold and the relationship between threshold and waveform are so important principles that cannot be ignored in this field. Any question related to fidelity is about threshold and the relationship between threshold and waveform.

Csaba Horváth