We can probably hear the difference between 48 and 96 kHz sampling in a quiet, modern studio, but it is difficult to say whether record buyers can. And are 24/96 converters real anyway? Yes, but they are difficult to do well.
You may be able to get 24 or so bits to wiggle 96,000 times per second, but that doesn ’t mean that the data itself carries any additional real information. Clock jitter is more difficult to deal with, for example, and noise levels in the analog stages – more than the digital circuitry – reduce the actual achievable dynamic range to well below the theoretical 144 dB. But switch a converter between 44.1 and 88.2 kHz sampling and see what you think. However, converters with higher sampling rates can still sound better. Let’s see why.
Filters
The usual rationale given for the need for higher sample rates involves anti-imaging and anti-aliasing filters.These are required to insure that no audio information above the Nyquist limit (half the sample rate) passes through the system.You need at least two digital samples per analog cycle to accurately quantize a waveform, and if you have only one (for example, because the waveform is at a higher frequency than the Nyquist limit), then you will sample a value, but it will be meaningless.
A digital sample captured by an A/D converter is like a single frame of action captured by a movie camera. The faster the samples are taken – the higher the sample rate – the more information is acquired. In film, a higher rate makes the action more smooth and fluid – although above a certain speed we can’t really notice the difference. In digital audio, the faster the sample rate, the better the frequency response – although above a certain point, we probably can’t hear the difference.
Imagine, using the film analogy, that we make a movie of a flashing light (an independent art movie, evidently). We make the light flash on and off faster and faster as time goes by. At some point it will be flashing at half the film speed – say the film is running at 24 frames per second and the light is flashing 12 times per second. If we look at the individual film frames at this point we will see that one frame shows the light on, and the next shows the light off. The next frame, it's on again. This is fine. But now imagine the light has sped up so that it flashes 24 times per second. Now, each time a frame is shot, the light is on. Or maybe off. Ooops – if we look at a succession of frames we will see that the light seems to be on or off all the time, depending on what part of the cycle the light was in when the shot was taken – which is not what the light is really doing.
Evidently, the film record is meaningless.
If you try sampling a waveform whose frequency is the same as the sample rate, the waveform behaves as if its frequency was 0 Hz! In more general terms, if you sample frequencies higher than the Nyquist limit, they behave like a mirror image of the frequencies below the limit. So if you’re sampling at 44.1 kHz, 30 kHz tone sounds the same as one at 14 kHz!
This is obviously undesirable, and as a result, digital systems from time immemorial (well, the last 30 years, anyway) have included anti-imaging and anti-aliasing filters to stop this very problem.
Unfortunately, these filters have traditionally sounded horrible.They need to pass all frequencies up to as close to the Nyquist limit as possible, but not a bit more. This makes their rolloffs very steep, and if implemented in the analog domain, as early ones were, they introduced enormous phase shifts into the audio – 1000 degrees out at 10 kHz was not uncommon. No wonder, then, that the top end sounded clangy and harsh and people said digital would never catch on. Improvements in filter design, specifically ones that used more gentle slopes, significantly improved the imaging and, consequently, the sound of digital audio.
A more complete solution would be to sample at higher rate, so the Nyquist limit would be well out of the way, and thus the filters could be smoother and operate way above anything audible. Sounds good in theory, and this may be where the original push for higher sample rates came from. We would probably have used higher sample rates earlier, had it been feasible outside the laboratory.
Today, however, such filters are implemented digitally, and phase errors aren’t such a problem. In addition, the type of converters used today multiply the effective sample rate internally, so that the apparent Nyquist limit is much higher than half the “real” sample rate. As a result of this “oversampling” technique, the filter rolloffs and frequencies can be kept well out of the way of the audio. So there’s no need for higher sample rates. After all, there ’s nothing higher than around 20 kHz to record. Is there...?
Why Record Ultrasonics?
As is widely recognized, most of us can ’t hear much above 18 kHz, but that does not mean that there isn’t anything up there that we need to record – and here's another reason for higher sampling rates. Plenty of acoustic instruments produce usable output up to around the 30 kHz mark – something that would be picked up in some form by a decent 30 in/s half-inch analog recording. A string section, for example, could well produce some significant ultrasonic energy.
Arguably, the ultrasonic content of all those instruments blends together to produce audible beat frequencies which contribute to the overall timbre of the sound. If you record your string section at a distance with a stereo pair, for example, all those interactions will have taken place in the air before your microphones ever capture the sound.You can record such a signal with 44.1 kHz sampling and never worry about losing anything –as long as your filters are of good quality and you have enough bits.
If, however, you recorded a string section with a couple of 48-track digital machines, mic on each instrument feeding its own track so that you can mix it all later, your close-mic technique does not pick up any interactions.The only time they can happen is when you mix – by which time the ultrasonic stuff has all been knocked off by your 48 kHz multitrack recorders, so that will never happen. It would thus seem that high sampling rates allow the flexibility of using different mic techniques with better results.
Pick A Number
Having established that higher sampling rates are a good idea – or at least a fact of modern life – there is question as to what the sample rate should actually be in studio environment. On the face of it, 96kHz takes care of capturing any audio that might ever happen, and 24 bits offer quite enough quantization steps. Is that enough?
Yes, in theory – more than enough. But there are some potential problems, real or imaginary, to having a production environment that has no better resolution than the consumer distribution format, and the emerging DVD-Audio standard offers not just 24-bit, 96kHz sampling: It even goes beyond that to support 192 kHz sampling in stereo.
[On the face of it this is quite absurd. Do we need to capture “audio” signals at up to 96 kHz? Obviously not – such signals don ’t exist. However, some recent research suggests that the human brain can discern a difference in a sound's arrival time between the two ears of better than 15 microseconds – around the time between samples at 96 kHz sampling – and some people can even discern a 5µS difference! So while super-high sample rates are probably unnecessary for frequency response, they may be justified for stereo and surround imaging accuracy. However, it should be noted that many authorities dispute this conclusion.]
Think of higher sample rates and longer word lengths as a kind of “headroom.” We need higher resolution in the studio than consumers so we can start with a higher level of quality in case some gets lost on the way – which might well happen.
And what happens when you modify a digital signal in the digital domain, say by EQing it, or fading it out? You create more bits – more data.You ought to have spare bits so you have room to work.You can always lose resolution, but you can’t easily get it back again.