Wednesday, June 13, 2007

Bits is Bits, Right?

"Never explain, never apologize." But in this month's "As We See It," I intend to do both. First, the apology:

To all those who ordered our Concert double-CD set and had to wait a long time to receive it: we're sorry about the delay. This project was described in the November 1994 issue of Stereophile; at the time of writing, I didn't imagine that we would have problems after the digital masters had been edited and sent off to the CD-pressing plant. Little did I know!

When I transferred the original analog tapes to digital, I used the Manley 20-bit ADC to feed our Sonic Solutions hard-disk editor, which can store up to 24-bit words. I did all the editing and master assembly at 20-bit resolution, then experimented with the various redithering/noise-shaping algorithms offered by the Meridian 618 processor to preserve as much as possible of the 20-bit quality when the word length was reduced to 16 bits on the CD master.

When I sent the master Concert CD-Rs off to the CD plant, therefore, I requested that the mastering engineer who was going to add the track'n'timing PQ subcodes do nothing to my data. In particular, I requested that he not remove DC offset (I had hand-trimmed the ADC for every transfer), not re-equalize, not change levels, etc. Most important, I requested that he not go to analog for any reason. The carefully prepared 16-bit words on my master were the ones I wanted to appear on the commercial CD.

Incidentally, an A/B comparison between the 20-bit data and the same data truncated to 16 bits—the four Least Significant Bits (LSBs) are simply chopped off—reveals the difference to be not at all subtle. If not quite, in the words of Ivan Berger of Audio, such a "gross" difference that my mother would immediately notice it, truncation nevertheless seems to be relatively easily identifiable—in short, like "traditional," "fatiguing" CD sound.

It's also interesting to note that the fact that my original had been analog tape with a noise floor higher than that of the 16-bit system wasn't a factor in these comparisons. Although many observers hold that analog noise ahead of the A/D converter effectively dithers it, this is not necessarily true. The noise is treated as part of the original signal, and is encoded as accurately or as inaccurately as the ADC allows. (The correct use of dither at the LSB level is a different issue.)

When I received test CD-Rs from the mastering engineer, who awaited my approval of them before starting the plant's presses rolling, I was horrified to find that the CD-Rs sounded different from my masters. In fact, before I even did any comparisons, I noticed that the CD-Rs sounded like the truncated samples of the Chopin Waltz I had used in my listening tests. I asked others at Stereophile to compare one of the CD-Rs with one I'd made when I did the original reduction to 16-bit word lengths. They heard the same difference. I then compared the pressing plant's CD-R against a DAT I had copied from the 16-bit master. Surprisingly, there was less of an audible difference, probably because my DAT recorder has high measured levels of datastream jitter; but the DAT still sounded fundamentally like what I'd intended the sound to be—in Linn terminology, it "played tunes"; the production CD-Rs didn't.

To make a long story short: I uploaded the production CD-R data into the Sonic Solutions and tried to null it against my original 16-bit data, which had been archived to 8mm Exabyte tape. As described by Bob Katz last December (Vol.17 No.12, pp.81-83), you feed the outputs of the two pairs of stereo digital tracks to the Sonic's digital mixer, then slide one pair of tracks backward and forward in time with respect to the other until you get sample synchronization, then flip that pair's polarity. If the data are the same, you get total—and I mean total—cancellation/silence. If not, the nature of the residue tells you where differences have been introduced. (I'm currently developing this technique as a much more meaningful version of the Hafler/Carver/Walker nulling test for amplifiers.)

Both my CD-R master and the DAT clone I'd made of it nulled totally against the 16-bit archive data, revealing that I had not made an error in the master preparation. However, not only could I not null the production data against the archive, the production master was longer by one video frame (1/30 of a second) every 20 minutes of program. This appears to show that the mastering engineer either a) used a sample-rate converter, or b) converted my carefully prepared data to analog, then reconverted it to digital using a 16-bit ADC with a sample clock running very slightly slower than mine (by just 1.2Hz!). What was incontrovertible was that all my careful work to preserve as much as possible of the 20-bit original's quality had been for naught.

Already having accepted that this would delay our customers receiving their discs, there was only one thing I could do. (Please note that we do not process credit-card orders or cash checks until the day we actually put product in the mail.) I prepared new 16-bit masters, sent them off this time to Digital Brothers in Costa Mesa, CA to have the PQ subcode data inserted, and instructed the pressing plant again to do nothing to the data. This time, when I received the test CDs, they nulled perfectly against the archived masters!

How common among CD-mastering engineers is the practice of changing the data when there's no reason for them to do so? Linn Products' Brian Drummond mentioned on CompuServe's CEAUDIO forum that, in the early days of CD, Linn Records sent off a digital master tape to the mastering plant and got back a CD with different numbers on it. "We did a bit-by-bit comparison against the master—they weren't the same."

A conversation I had with Reference Recordings' Keith Johnson at last November's AES Convention (see the report in this issue's "Industry Update") revealed that he initially had the same problem with his HDCD®-encoded recordings. In these, the additional high-resolution data are encrypted in the LSBs in a proprietary manner. Any DSP, even level-changing, will change the words and therefore destroy the encoding.

Keith related to me that the first test disc cut from his HDCD masters not only sounded wrong, it wouldn't illuminate the HDCD LED on his prototype decoder. In fact, it was no longer an HDCD recording. It turned out that that particular mastering engineer routinely converted every digital master to analog, then back to digital! Other mastering engineers never switch off the default input DSP options on their hard-disk editors, not realizing that, with 16-bit data, this will reintroduce quantizing noise. No wonder so many CDs sound bad!

As well as indicating the dangers of entrusting your work to the hands of others with different agendas, this story is interesting in that all the changes made to my data were at such a low level—30dB or more below the analog tape hiss—that proponents of "bits is bits" would argue that, whatever the mastering engineer had done, the differences introduced should have been inaudible. Yet what had alerted me to the fact that the data had been changed was a significant change in the quality of the sound—a degradation that I heard even without having the originals on hand for an A/B comparison!

Those who continually point out in Stereophile's "Letters" column that the differences in sound quality discussed by our writers are imaginary and probably due to the placebo effect should note that I had huge emotional and financial investments in wanting the production CD-Rs to sound the same as the originals. If I were to hear any difference, it would both cost Stereophile quite a lot of money to have the project remastered and delay our customers receiving their Concert CDs. In fact, it took time to work through the cognitive dissonance to recognize that I was hearing a difference when I expected—and wanted to hear—none.

"Bits is bits"? In theory, yes. In practice? Yeah, right!

http://www.stereophile.com/asweseeit/372/

The Analog Compact Disc

Back in the days when vinyl was our only source of high-quality recorded music, the first question when playing a new record was often, "How good is the pressing?" LPs varied so much in their mastering and pressing qualities that buying a new one was often a crapshoot.

With the advent of the digital compact disc, however, few of us bother to think about variations in CD quality. We assume that if the CD spins and makes sound, the disc must have been manufactured perfectly.

In reality, compact discs vary greatly in manufacturing quality. The process of creating the tiny pit and land structures that represent the music, then transferring those structures to an inexpensive mass-produced product, is fraught with potential problems. The result is a wide variation in the technical—and sometimes musical—performance of our CDs.

In this article, we'll look at how the CD works, how CDs are made, and what can go wrong in the disc-manufacturing process. In addition, I'll report on the disc quality of a sampling of CDs made at different pressing plants around the world.

This technical evaluation of CD quality was made possible by a unique CD analyzer obtained by Stereophile (see Sidebar) that reveals CD data-error types and rates, and allows an examination of the critical signals coming off a CD. Finally, I'll show you how to identify the factory where a disc was made, and debunk some common myths about data errors on CDs (footnote 1).

How the Compact Disc works
A compact disc is a piece of polycarbonate (a type of plastic) on which a spiral track has been impressed. This spiral track is a series of indentations ("pits") separated by flat areas ("land"). This alternating pit-and-land structure can be seen in fig.1, a scanning electron microscope photograph of a CD surface. The white line at the top of the photograph provides the scale; the line is 10µm (ten micrometers, or microns) long. To put the extraordinarily small size of the pits into perspective, a human hair is about 75µm in diameter. By the scale of this photograph, a human hair would be a foot and a half thick (footnote 2).

Fig.1 Scanning electron microscope photograph of a CD surface. The white line at the top is 10µm in length. A human hair has a diameter of about 75µm. (Photo by Alvin Jennings.)

Digital audio data are encoded in the spiral track of pit and land, waiting to be recovered by your CD transport. The transport's laser beam is focused on the spinning disc, which is coated with a thin, reflective metal layer, almost always aluminum (gold and brass are also occasionally used). The disc's metal coating reflects the beam back to a photodetector, a device that converts light into an electrical signal.

When the laser beam is reflected from the land, the beam is returned to the photodetector at virtually full strength. The laser beam is significantly reduced in intensity when it reflects from a pit bottom because the pit depth is one-quarter the wavelength of the playback laser beam. The portion of the beam reflected from the pit bottom is therefore shifted in phase by 180 degrees (one-quarter wavelength going down, then another one-quarter wavelength going back up) in relation to the beam portion reflected from the land. A 180 degrees phase shift is half a wavelength—equivalent to a polarity reversal. When the two out-of-phase parts of the beam combine to strike the photodetector, they cancel. It's like wiring your loudspeakers out of phase and hearing less bass: when one woofer moves forward, the other moves backward, and the waves cancel each other.

Rather than represent one condition by binary "zero" and the other by binary "one," it was felt better to represent a reflection from both land and pit bottom as a binary "zero," with binary "one" corresponding to the change in beam intensity when the beam is reflected from a pit-to-land or land-to-pit transition. In short, land or a pit bottom are binary 0, transitions are binary 1.

The photodetector therefore outputs a varying voltage in response to the pit pattern—a voltage that contains all the binary 1s and 0s encoded on the disc.

An encoding scheme called "Eight to Fourteen Modulation" (EFM) formats the data to be recorded on the disc according to certain rules. EFM coding creates a bit stream in which binary 1s are separated by a minimum of two 0s and a maximum of ten 0s. The shortest pit or land length therefore represents the binary data "1001," and the longest pit or land length represents the binary data "100000000001." EFM coding creates a specific pattern of ones and zeros that results in nine discrete pit or land lengths on the disc. You can see the discrete nature of the pit and land lengths in the photograph shown in fig.1.

http://www.stereophile.com/features/827/

Digital Sound

Everybody, including myself, was astonished to find that it was impossible to distinguish between my own voice, and Mr. Edison's re-creation of it.—Anna Case, Metropolitan Opera Soprano, 1915

In the July 1990 "As We See It" (Vol.13 No.7, p.5), I examined the conflict between those who believe existing measurements can reveal and quantify every audible aspect of a component's behavior, and those who consider the listening experience a far better indicator of a component's performance than the numbers generated by "objective" measurements. Implicit in the objectivist position is the assumption that phenomena affecting a device's audible characteristics are well understood: any mysteries have long since been crushed by the juggernaut of scientific method. If people then hear differences that "science" cannot measure or quantify, those differences exist only in people's minds and have no basis in reality. Consequently, observers' listening impressions are virtually excluded from consideration as merely "subjective," unworthy of acceptance by audio science. This belief structure is at the very core of the audio engineering establishment, and is the guiding force behind their research efforts (footnote 1).

The subjectivists believe that the ear is far superior to test instruments in resolving differences. It is also axiomatic that vast areas of audio reproduction, far from being fully researched and understood, are instead considerably more complex than the simple scientific models used to describe them.

A good example of this is the well-publicized subject of CD treatments. In the October issue (Vol.13 No.10, p.5) I described my experiences with cryogenically frozen CDs, as well as CDs pressed from the same stamper but made from different molding materials. I am particularly fascinated by CD treatments not for what they do, but for what they represent. The fact that cryogenically freezing a CD results in an easily audible change in sound points to the uncomfortable (for the engineering establishment) conclusion that everything is not as simple and well-understood as is thought.

A result of this dichotomy is that academic researchers—the very people in whose hands lie the tools and knowledge to discover the physical causes of these phenomena—are the least likely to listen critically and the most likely to dismiss the audiophile's claims as nothing more than voodoo. Consequently, their research direction is dictated by improving measured performance rather than increasing subjective performance; the latter is far more meaningful when the goal of research is to better communicate the musical experience.

Those who make their livings from digital audio (like mastering engineers) have long complained about sonic anomalies and perceptual differences where no differences should theoretically exist. The academic audio community, as well as manufacturers of professional digital audio equipment, maintain that these differences—unsupported by theory and unmeasurable—are products of the listeners' imaginations.

This thinking was exemplified by events during a two-day Sony seminar on digital mastering technology I attended a few years ago. The designers of the Sony PCM-1630, DAE-1100 digital editor, and other digital mastering equipment were present. The seminar was attended by mastering engineers who work with, and listen to, this equipment daily.

One of the mastering engineers expressed his concern over the audible degradation that occurs when making digital-to-digital tape copies, and the sonic differences introduced by the digital editor, especially when using the editor's level adjustment (footnote 2). These comments set off an outspoken flurry of concurrence among the assembled mastering engineers. The Sony designers argued vehemently that no differences were possible, and regarded the collective perception with some amusement. This exchange is a microcosm of the conflict between those who listen and those who measure.

Such is the background of this essay's subject, a paper by Dr. Roger Lagadec entitled "New Frontiers in Digital Audio" presented at the most recent Audio Engineering Society Convention in Los Angeles. I believe this paper will one day be considered a turning point in digital audio's evolution. Copernican in scope, it is likely to radically change the direction of and thinking in audio engineering. Lagadec's thesis is of utmost importance to the audiophile, both because of its promise of greatly improved digital audio, and for its validation of a fundamental audiophile philosophy: the importance of critical listening in evaluating audio technology over the belief that existing measurements can reveal all differences. Furthermore, and perhaps most significant, the paper was written by a man considered by many to be the world's foremost thinker in digital audio, whose ideas carry enormous influence in the audio engineering community.

As a pioneer in digital audio since 1973, Dr. Lagadec has conducted fundamental research into digital signal processing, was one of the developers of the Digital Audio Stationary Head (DASH) format while at Studer, and has offered broad, conceptual insights into the nature of digital audio. He holds a Ph.D. in Technical Sciences in the field of Digital Signal Processing, and has been actively involved in setting digital audio standards within the AES, of which he was named a Fellow in 1983. He is now responsible for all professional digital products at Sony. It is difficult to overstate Dr. Lagadec's credentials or his ability to influence digital-audio thinking.

"New Frontiers in Digital Audio" is bold in concept, brilliant in its simplicity, and technically incontrovertible. The paper identifies two areas of digital audio considered fully understood—digital-domain gain adjustment and dither—and reveals fundamental concepts about these areas that had not previously been considered (footnote 3). Moreover, the paper correlates these new discoveries with the perceptions of trained listeners whose comments were once considered heresy. Significantly, Dr. Lagadec's thesis extends beyond digital gain adjustment and dither: these two relatively simple issues are paradigms for the broader and more complex conflict between measurement and human musical perception. In this analysis, I will avoid most of the paper's technical details and focus instead on the broader issues raised (footnote 4).

Dr. Lagadec challenges the conventional wisdom that requantizing a digital audio signal with a digital fader produces only a change in level accompanied by a slight noise increase. "The imprecise, but by no means uncertain, answer of experienced users has sometimes been that—with critical signals—the texture of the new signal, its fine structure, possibly its precise spatial definition, will be affected: the signal will (sometimes) have changed in a way uncorrelated to level change and noise level, in spite of the extreme simplicity of the digital signal processing it underwent....The rest of this chapter cannot have the ambition of proving that such vague (but genuine) comments are true in an absolute sense. Rather, it will try to make the point that, based on a straightforward analysis, it is implausible that well-trained personnel would not detect differences beyond noise and signal level." (emphasis in original)

This, in itself, is a remarkably bold position for Dr. Lagadec to adopt. To acknowledge that previously unidentified phenomena affect the subjective perception of digitally processed music is indeed a milestone on the road to improving digital audio. Furthermore, the thesis doesn't summarily reject the listening experience as an important contributor to understanding these phenomena. The audio engineering establishment typically rejects listening because one's perceptions cannot be proven in a scientifically acceptable method and are therefore meaningless. It is also unusual for a man of science to use a lexicon associated more with audiophiles than scientists ("the texture of the new signal, its fine structure, possibly its precise spatial definition"). The audiophile, however, would have described these perceptions in more blunt terms; textures hard rather than liquid, loss of inner instrumental detail, and a collapsed soundstage.

Dr. Lagadec supports his thesis with a very simple analysis of what happens when changing level in the digital domain and the nature of the attendant requantization error. This function is considered perhaps the simplest and best-understood type of digital signal processing. However, he has discovered a previously unknown form of error created by this simple processing: the digital gain control's transfer function (difference between input and output signals) varies according to the amount of gain or attenuation. The nature of the transfer function's non-linearity (imprecision), introduced by changing gain in the digital domain, is determined by the gain pair ratios; ie, the relative beginning signal level and the signal level after gain reduction. I won't go into the details here: the phenomenon is explained and documented fully in the paper.

This discovery is extraordinary for two reasons. First, it vindicates those who have long maintained, after critical listening, that digital faders affect sound quality (footnote 5). Second and more important, it reveals that even the simplest aspects of digital audio that are thought to be well understood are, in fact, not well understood. Dr. Lagadec has worked for the past 7 of his 17 years in digital audio in the area of digital level control, yet recognized and began researching this phenomenon during only the past year. Again, I quote Dr. Lagadec's paper: "It is remarkable that such a simple system, eminently amenable to the methods used in non-linear dynamics, has not—to the author's limited knowledge—been widely publicized yet. If an element of surprise can come from analyzing such simple systems, more instructive surprises may be in store when more complex ones are scrutinized."

With this last sentence, Dr. Lagadec implies that a digital Pandora's Box may be opened by closer analysis of other aspects of digital audio. Like the Pandora's Box of Greek mythology that contained the world's troubles, this digital Pandora's Box may reveal other problems in digital audio that no one knew existed. An understanding of these cracks in conventional digital audio theory will go a long way toward correlating listeners' subjective impressions with objective fact.

The paper then examines dither, another area which, like digital gain adjustment, is considered a closed subject because it is well understood. Dr. Lagadec presents a hypothesis which states that dither should be optimized based on the ear's short-term perception of quantization noise, rather than the current mathematically based long-term analysis: "Needless to say, the 'optimal' dither types in the long-term statistical sense which have been proposed by Vanderkoy [sic] and Lipshitz (footnote 6) are a very valid first approach. As they are, however, independent of any practical detector model, it is not unfair to expect further improvements in perceived performance from dither models optimized in a different, less mathematically rigorous, and more perceptually oriented way."

With that analysis, Dr. Lagadec again proposes that an existing precept, thought to be immutable, is in fact far from a settled question. Moreover, one can infer that future research should be based on improving perceptual qualities rather than conforming to mathematical theories. This is reflected in the phrases "independent of any practical detector model" (my interpretation: "without regard for human hearing"), and "dither models optimized in a different, less mathematically rigorous, and more perceptually oriented way." (emphasis added)

Significantly, this suggests that the criterion for what is considered optimum dither should be based on human hearing rather than on purely mathematical ideas or measurements that have little relation to auditory perception (footnote 7). This represents a remarkable shift in thinking away from the scientific dictum that measurements and theory are more reliable and important than human perception in determining "what is good" in music reproduction. The human perceptual element in audio engineering has long been disregarded because it cannot be quantified. The ability to measure and quantify an entity are the criteria by which science judges that entity's reality. The scientific mind tends to mistrust anything than cannot be represented or communicated by linear symbols. These symbols that describe reality, obtained by measurement and calculation, assume a greater importance than, or are even mistaken for, the actual reality they try to represent. It is thus momentous that one of the world's foremost audio scientists has called for accepting musical perception directly rather than in the abstract, linear terms of representational thinking.

Dr. Lagadec points to some future directions in digital audio, including "a much greater word length" than the current 16-bit system, and bandwidth much wider than 20kHz. This additional bandwidth would be "kept open for, say, low-level harmonics, harmonics due to non-linear processing, and out-of-band noise shaping." An advantage he notes of having bandwidth beyond 20kHz "would be the freedom to disregard the arguments as to whether there is perceptible sound beyond 20kHz." However, he notes that "For economic reasons, it is evident that hardware capable of such parameters at an acceptable cost is not for this decade."

This concept of today's digital audio being in its infancy and subject to radical changes in fundamental precepts is in sharp contrast to the prevailing view among most academics that the 20kHz bandwidth is adequate, and that properly dithered 16-bit representation provides sufficient resolution and dynamic range. Indeed, the idea that today's digital audio parameters are perfectly satisfactory for music were expressed by Dr. Stanley Lipshitz parenthetically in his "Tutorial on Phase," given at the convention. He somewhat derisively scolded critics of digital audio for speciously (in his view) blaming today's digital audio's fundamental parameters (he specifically mentioned sampling rate and word length) as the cause of its inferiority (in the critics' view) to analog.

Dr. Lagadec then says that since the previously unknown aspects of digital audio he has discovered correlate to what critical listeners have been saying for years, perhaps other claims of audible differences should not be dismissed so cavalierly by the audio engineering establishment, despite the lack of scientific proof of such differences. I was astonished by the paper's last paragraphs (quoted below), in which Dr. Lagadec expands his thesis by bringing up the subject of audible differences between cables. Claims of differences between cables have long been a bugaboo of audio engineers. Further, he contends that if no measurable differences exist between cables, yet critical listeners report such differences, perhaps our understanding of human hearing acuity is suspect, rather than the rationality of those who hear differences.

The paper concludes by calling for the world's audio scientists and researchers to vigorously pursue these new challenges and to make room for, rather than exclude, the role of listening in advancing audio science. Dr. Lagadec writes:

"The industry is full of lore as to the superior sound quality of some cables, connectors, electronic devices, and the like. Assuming, as scientists presumably should, that things can only sound different if they cause signals to become different, and using the technology available today to ascertain whether differences do exist, and if so what they consist of, we may hope to achieve reproducible improvements; to deepen our understanding of sound quality; and to separate unfounded legends from justifiable improvements.

"Conversely, if we were to discover that, when, say, different cables are used, the signals look the same beyond the resolution of today's best converters, but still sound reproducibly different, then we would indeed still have much to learn about human audibility.

"The advanced tools available today—the recorders, computer software, workstations, DSP chips and boards, monitor systems, A/D and D/A conversion systems, instrumentation—and which are within reach of any university might be put to use, scientifically, aggressively, to find out how we hear, and how we might improve what we hear. Every generation since Edison's days has said that its sound recordings were almost better than the original, and at the very least indistinguishable from reality. Ours will hardly be an exception, neither in hubris and hype, nor in the disappointment. Yet we have tools for generating and manipulating signals, moving them in space and time, which few of our predecessors dreamed of. The tools deserve to be used, and our engineers deserve to be guided, by scientists who will advance the state of the art ahead of the state of the industry."

Although the tenets put forth in "New Frontiers in Digital Audio" are hardly new to audiophiles, the paper is revolutionary to the mindset of the audio engineering community. What makes this paper such a significant and extraordinary event is the credibility and influence of its author. You may be certain that Dr. Lagadec does not speak lightly or from a shaky platform. Consequently, the ideas expressed in the paper will be given serious consideration by those accustomed to attacking the very same ideas when espoused by those of us without Ph.Ds.

As I considered the paper's ramifications, I couldn't help thinking about the people who for years have reported sonic anomalies in digital audio, only to be met with skepticism and ridicule. This is especially true of Doug Sax, who was one of the first and most outspoken critics of digital. During the past eight years, he has reported his listening experiences to an indifferent world. Despite his pre-eminence in the field of record mastering, he is regarded as a pariah by the audio engineering community for his views, views which I believe have been taken a step toward scientific fact by "New Frontiers in Digital Audio."

If "genius" is defined as the arriving at conclusions ten years before the rest of the world reaches those same conclusions, then both Roger Lagadec and Doug Sax, disparate as their approaches are, certainly qualify. The fact that Dr. Lagadec's paper may cause the audio engineering establishment to take seriously the listening impressions of people like Doug Sax is small consolation to music lovers who must listen to inferior CDs made during the period between when the problems were first reported (1982) and when the problems' existence were proved in a scientifically acceptable method (1990).

"New Frontiers in Digital Audio" holds out the hope that one day digital audio may exceed analog's performance in all respects. It is my sincerest hope that our successors regard today's pronouncements of digital audio's quality with the same combination of humor and incredulity with which we view Anna Case's assessment of Mr. Edison's machine.

http://www.stereophile.com/asweseeit/1290awsi/

The Jitter Game

With those lines from Richard II, Shakespeare unwittingly described a phenomenon in digital audio called "word clock jitter" and its detrimental effect on digitally reproduced music. "Clock jitter" refers to timing errors in analog/digital and digital/analog converters—errors that significantly degrade the musical quality of digital audio.

Clock jitter is a serious and underestimated source of sonic degradation in digital audio. Only recently has jitter begun to get the attention it deserves, both by high-end designers and audio academics. One reason jitter has been overlooked is the exceedingly difficult task of measuring such tiny time variations—on the order of tens of trillionths of a second. Consequently, there has previously been little hard information on how much jitter is actually present in high-end D/A converters. This is true despite the "jitter wars" between manufacturers who claim extraordinarily low jitter levels in their products. Another reason jitter has been ignored is the mistaken belief by some that if the ones and zeros that represent the music are correct, then digital audio must work perfectly. Getting the ones and zeros correct is only part of the equation.

Stereophile has obtained a unique instrument that allows us to measure jitter in CD players and digital processors. Not only can we quantify how much jitter is afflicting a particular D/A converter, we can look at something far more musically relevant: the jitter's frequency. Moreover, an analysis of jitter and what causes it goes a long way toward explaining the audible differences between CD transports, digital processors, and, particularly, the type of interface between transport and processor.

This article presents a basic primer on word clock jitter, explains how it affects the musical performance of digital processors, and reports the results of an investigation into the jitter performances of 11 high-end digital processors and one CD player. In addition, we are able—for the first time—to measure significant differences in jitter levels and spectra between different types of CD transport/digital processor interfaces.

We have found a general correlation between a digital processor's jitter performance and certain aspects of its musical presentation. The jitter measurements presented in this article were made on processors with whose sound I was familiar; in preparation for their reviews, each had been auditioned at matched levels for at least three weeks in my reference playback system. Because the reviews of these processors have already been published, it's possible to compare the musical impressions reported to the processors' jitter performance. Although these jitter measurements are far from the last word in quantifying a digital processor's musical performance, there is nevertheless a trend that suggests a correlation between listening and measurement.

This article will also attempt to dispel the popular notion that "bits is bits." This belief holds that if the ones and zeros in a digital audio system are the same, the sound will be the same. Proponents of this position like to draw the analogy of putting money in the bank: "your money," though merely a digital representation on magnetic tape, remains inviolate (you hope). There's a problem with this argument, however: unlike the bank's digital record on magnetic tape, digital audio data is useful only after it is converted to analog. And here is where the variability occurs. Presenting the correct ones and zeros to the DAC is only half the battle; those ones and zeros must be converted to analog with incredibly precise timing to avoid sonic degradation.

As we shall see, converting digitally represented music into analog—a process somewhat akin to turning ground beef back into steak—is far more complex and exacting than had been realized.

Sampling
To understand how even small amounts of clock jitter can have a large effect on the analog output signal, a brief tutorial on digital audio sampling is helpful.

Sampling is the process of converting a continuous event into a series of discrete events. In an analog-to-digital (A/D) converter, the continuously varying voltage that represents the analog waveform is "looked at" (sampled) at precise time intervals. In the case of the Compact Disc's 44.1kHz sampling rate, the A/D converter samples the analog waveform 44,100 times per second. For each sample, a number is assigned that represents the amplitude of the analog waveform at the sample time. This number, expressed in binary form (one or zero) and typically 16 bits long, is called a "word." The process of converting the analog signal's voltage into a value represented by a binary word is called "quantization," the effectively infinite range of values allowable in an analog system being reduced to a limited number of discrete amplitudes. Any analog value falling between two binary values is represented by the nearest one.

Sampling and quantization are the foundation of digital audio; sampling preserves the time information (as long as the sampling frequency is more than twice the highest frequency present in the analog signal) and quantization preserves the amplitude information (with a fundamental error equal to half the amplitude difference between two adjacent binary values). We won't worry about quantization here—it's the sampling process we need to understand.

The series of discrete samples generated by the A/D converter can be converted back into a continuously varying signal with a D/A converter (DAC). A DAC takes a digital word and outputs a voltage equivalent to that word, exactly the opposite function of the A/D converter (ADC). All that is required for perfect conversion (in the time domain) is that the samples be input to the DAC in the same order they were taken, and with the same timing reference. In theory, this sounds easy—just provide a stable 44.1kHz clock to the A/D converter and a stable 44.1kHz clock to the D/A converter. Voilà!—perfect digital audio.

Clock jitter
Unfortunately, it isn't that easy in practice. If the samples don't generate an analog waveform with the identical timing with which they were taken, distortion will result. These timing errors between samples are caused by variations in the clock signal that controls when the DAC converts each digital word to an analog voltage.

Let's take a closer look at how the DAC decides when to convert the digital samples to analog. In fig.1, the binary number at the left is the quantization word that represents the analog waveform's amplitude when first sampled. The bigger the number, the higher the amplitude. (This is only conceptually true—in practice the data are in twos-complement form, which uses the most significant bit or MSB at the start of the word as a sign bit, a "1" meaning that the amplitude is negative.)

Fig.1 The word-clock signal triggers the DAC to output an analog voltage equivalent to the input digital word.

The squarewave at the top of fig.1 is the "word clock," the timing signal that tells the DAC when to convert the quantization word to an analog voltage. Assuming the original sampling frequency was 44.1kHz, the word clock's frequency will also be 44.1kHz (or some multiple of 44.1kHz if the processor uses an oversampling digital filter). On the word clock's leading edge, the next sample (quantization word) is loaded into the DAC. On the word clock's falling edge, the DAC converts that quantization word to an analog voltage. This process happens 44,100 times per second (without oversampling). If the digital processor has an 8x-oversampling digital filter, the word-clock frequency will be eight times 44,100, or 352.8kHz.

It is here at the word clock that timing variations affect the analog output signal. Specifically, clock jitter is any time variation between the clock's trailing edges. Fig.2 shows a perfect clock and a jittered clock (exaggerated for clarity) (footnote 1).

Fig.2 Word-clock jitter consists either of a random variation in the pulse timing or a variation which itself has a periodic component.

Now, look what happens if the samples are reconstructed by a DAC whose word clock is jittered (fig.3). The sample amplitudes—the ones and zeros—are correct, but they're in the wrong place! The right amplitude at the wrong time is the wrong amplitude. A time variation in the word clock produces an amplitude variation in the output, causing the waveform to change shape. A change in shape of a waveform is the very definition of distortion. Remember, the word clock tells the DAC when to convert the audio sample to an analog voltage; any variations in its accuracy will produce an analog-like variability in the final output signal—the music.

Fig.3 Analog waveform is constructed correctly with a jitter-free word clock (top); word-clock jitter results in a distortion of the analog waveform's shape (exaggerated for clarity).

There's more. Clock jitter can raise the noise floor of a digital converter, reducing resolution, and can introduce spurious artifacts. If the jitter has a random distribution (called "white jitter" because of its similarity to white noise), the noise floor will rise. If, however, the word clock is jittered at a specific frequency (ie, periodic jitter), artifacts will appear in the analog output as sidebands on either side of the audio signal frequency being converted to analog. It is these periodic artifacts that are the most sonically detrimental; they bear no harmonic relationship to the music and may be responsible for the hardness and glare often heard from digital audio.

These principles were described in JA's computer simulations of the effects of different types and amounts of jitter in Vol.13 No.12; I've included three of his plots here. Fig.4 is a spectral analysis of a simulated DAC output when reproducing a full-scale, 10kHz sinewave with a jitter-free clock. Fig.5 is the same measurement, but with two nanoseconds (2ns or 2 billionths of a second) of white jitter added—note the higher noise floor. Fig.6 shows the effect of 2ns of jitter with a frequency of 1kHz. The last plot reveals the presence of discrete frequency sidebands on either side of the test signal caused by jitter of a specific frequency. The amplitude of these artifacts is a function of the input signal level and frequency; the higher the signal level and frequency, the higher the sideband amplitude in the analog output signal (footnote 2).

Fig.4 Audio-band spectrum of jitter-free 16-bit digital data representing a 10kHz sinewave at 0dBFS (linear frequency scale).

Fig.5 Audio-band spectrum of 16-bit digital data representing a 10kHz sinewave at 0dBFS afflicted with 2ns p-p of white-noise jitter (linear frequency scale). Note rise of approximately 12dB in the noise floor compared with fig.4, which represents a significant 2dB loss of signal resolution.

Fig.6 Audio-band spectrum of 16-bit digital data representing a 10kHz sinewave at 0dBFS afflicted with 2ns p-p of sinusoidal jitter at 1kHz (linear frequency scale). Note addition of sidebands at 9kHz and 11kHz compared with fig.4, though the noise floor remains at the 16-bit level.

How much jitter is audible? In theory, a 16-bit converter must have less than 100 picoseconds (ps) of clock jitter if the signal/noise ratio isn't to be compromised. (There are 1000ps in a nanosecond; 1000ns in a microsecond; 1000µs in a millisecond; and 1000ms in a second.) Twenty-bit conversion requires much greater accuracy, on the order of 8ps. 100ps is one-tenth of a billionth of a second (1/1010s), about the same amount of time it takes light to travel an inch. Moreover, this maximum allowable figure of 100ps assumes the jitter is random (white) in character, without a specific frequency which would be sonically less benign. Clearly, extraordinary precision is required for accurate conversion (see Sidebar).

Where does clock jitter originate? The primary source is the interface between a CD transport and a digital processor. The S/PDIF (Sony/Philips Digital Interface Format) signal that connects the two has the master clock signal embedded in it (it is more accurate to say the audio data are embedded in the clock). The digital processor recovers this clock signal at the input receiver chip (usually the Yamaha YM3623B, Philips SAA7274, or the new Crystal CS8412).

The typical method of separating the clock from the data and creating a new clock with a phase-locked loop (PLL) produces lots of jitter. In a standard implementation, the Yamaha chip produces a clock with 3-5 nanoseconds of jitter, about 30 to 50 times the 100ps requirement for accurate 16-bit conversion (the new Crystal CS8412 input receiver in its "C" incarnation reportedly has 150ps of clock jitter). Even if the clock is recovered with low jitter, just about everything inside a digital processor introduces clock jitter: noise from digital circuitry, processing by integrated circuits—even the inductance and capacitance of a printed circuit board trace will lead to jitter.

It's important to note that the only point where jitter matters is at the DAC's word-clock input. A clock that is recovered perfectly and degraded before it gets to the DAC is no better than a high-jitter recovery circuit that is protected from additional jitter on its way to the DAC. Conversely, a highly jittered clock can be cleaned up just before the DAC with no penalty (footnote 3).

Logic induced modulation (LIM)
There are two other significant sources of jitter in D/A converters. The first mechanism was recently discovered by Ed Meitner of Museatex and Robert Gendron, formerly a DAC designer at Analog Devices and now at Museatex. This jitter-inducing phenomenon, called Logic Induced Modulation (LIM), was discovered only after Meitner and Gendron invented a measurement system that revealed its existence. This measurement tool, called the LIM Detector, reveals not only how much clock jitter is present in a digital processor, but also displays its spectral distribution when connected to a spectrum analyzer or FFT machine. The jitter's spectral content—and whether or not it is random or composed of discrete frequencies—is much more important sonically than the overall amount of jitter. Two digital processors could each claim, say, 350ps of jitter, but the processor whose word clock was jittered at a specific frequency would likely suffer from a greater amount of sonic degradation than the other processor which had the same RMS level of random jitter. More on this later.

It's worth looking at Logic Induced Modulation in detail; the phenomenon is fascinating:

LIM is a mechanism by which the digital code representing an audio signal modulates (jitters) the clock signal. If a digital processor is driven by the code representing a 1kHz sinewave, the clock will be jittered at a frequency of 1kHz. Put in 10kHz, and jitter with a frequency of 10kHz will appear on the clock. Remember, jitter with a specific frequency is much more sonically pernicious than random-frequency jitter.

Here's how LIM is generated. In an integrated circuit (IC), there are many thousands of transistors running off the same +5V (usually) power-supply rail. When an IC is processing the code representing a 1kHz sinewave, for example, those thousands of transistors are turning on and off in a pattern repeating 1000 times per second. The current demands of all those transistors turning on together modulates the power-supply rail at the frequency of the audio signal. Just as the lights in a house dim momentarily when the refrigerator motor turns on, the +5V power-supply rail droops under sudden current demand from the chip's transistors. The analog audio signal thus appears on the IC's power-supply rail.

Now, the transition between a logic "0" and a logic "1" occurs at the leading edge of a squarewave. The precise point along the leading edge at which the circuit decides that a "1" is present is determined by the power-supply voltage reference. If that voltage fluctuates, the precise time along the leading edge at which the circuit recognizes a "1" will also fluctuate—in perfect synchronization with the power-supply voltage modulation. This uncertainty in the timing of the logic transitions induces jitter on the clock—at the same frequency as the audio signal the IC is processing. Put in the code representing 1kHz and the IC's power supply will be modulated at 1kHz, which in turn causes jitter on the clock at 1kHz. According to Meitner, it is possible to AC-couple a high-gain amplifier to the digital power-supply rail and hear the music the processor is decoding.

This astonishing phenomenon was discovered quite by accident after Meitner and Gendron designed the device to display the jitter's spectral distribution. When they put in a 1kHz digital signal, the jitter had a frequency of 1kHz with its associated harmonics. (footnote 4, footnote 5)

There is another mechanism by which clock jitter correlated with the audio signal is created. This phenomenon, described by Chris Dunn and Dr. Malcolm Hawksford in their paper presented at the Fall 1992 Audio Engineering Society convention (and alluded to in the Meitner/Gendron paper), occurs in the AES/EBU or S/PDIF interface between a transport and a digital processor. Specifically, they showed that when the interface is band-limited, clock jitter with the same frequency as the audio signal being transmitted is produced in the recovered clock at the digital processor. Although this phenomenon produces the same type of signal-correlated jitter as LIM, it is a completely different mechanism. A more complete discussion of Dunn's and Hawksford's significant paper [reprinted in Stereophile in March 1996 (Vol.19 No.3) as "Bits is Bits?"—Ed.] can be found in my AES convention report in next month's "Industry Update." (footnote 6)

Measuring clock jitter
Museatex has made the LIM Detector available to anyone who wants to buy one. Stereophile jumped at the chance (see Sidebar).

First, you have to open the processor's chassis and find the DAC's word-clock pin with a conventional oscilloscope and probe. The probe hooked up to the word-clock signal is then connected to the input of the LIMD and the LIMD is tuned to that word clock using preset frequency settings. To look at the spectrum of any processor's word clock (up to 20kHz), we fed the LIMD's analog output to our Audio Precision System One Dual Domain to create FFT-derived spectral analysis plots. A one-third-octave analyzer can also be used, though this gives less frequency resolution, of course. If the output of the LIMD is connected to an RMS-reading voltmeter, the overall jitter level can be read as an AC voltage. Knowing how many millivolts are equivalent to how many picoseconds of jitter—the LIMD output voltage also depends on the sampling frequency—allows the jitter to be easily calculated.

The measurements of digital-processor clock jitter included in this article thus include the processor's overall jitter level expressed in picoseconds and a plot of the jitter's spectral distribution. The latter is scaled according to the RMS level—0dB is equivalent to 226.7ns of jitter—so that spectra for different processors can be readily compared.

Before getting to the measurement results, a quick description of how the LIMD works is useful. Like all brilliant inventions, the technique is simple and obvious—after you've been told how it works.

A jittered clock can be considered as a constant carrier signal which has been frequency-modulated (FM). The jitter components can therefore be separated from the clock by an FM demodulator—just like those found in all FM tuners. In the LIMD, once it has been correctly tuned to the word-clock frequency, an FM demodulator removes the clock signal, leaving only the audio-band jitter components—which can be measured as a voltage or output for spectral analysis.

I have no doubt that many manufacturers of the digital processors tested for this article will question the test methodology and results. Some claim extraordinarily low jitter in their products—claims that were not confirmed by my measurements. This disparity can arise because there is no standard method of measuring jitter. When I'm told by a manufacturer that his product has "less than 70ps of jitter," my first question is, "How do you measure 70ps?" The response is often less than adequate: "We calculate it mathematically" is a common reply. Moreover, some jitter measurements attempt to measure jitter indirectly—as a function of the rise in the noise floor, for example—rather than looking directly at the word clock.

At any rate, if the absolute levels presented in this article are in error, the relative differences between processors will still be correct. If anyone can demonstrate a better method of measuring jitter, I'm all ears.

http://www.stereophile.com/reference/193jitter/

CD: Jitter, Errors & Magic

The promise of "perfect sound forever," successfully foisted on an unwitting public by the Compact Disc's promoters, at first seemed to put an end to the audiophile's inexorable need to tweak a playback system's front end at the point of information retrieval. Several factors contributed to the demise of tweaking during the period when CD players began replacing turntables as the primary front-end signal source. First, the binary nature (ones and zeros) of digital audio would apparently preclude variations in playback sound quality due to imperfections in the recording medium. Second, if CD's sound was indeed "perfect," how could digital tweaking improve on perfection? Finally, CD players and discs presented an enigma to audiophiles accustomed to the more easily understood concept of a stylus wiggling in a phonograph groove. These conditions created a climate in which it was assumed that nothing in the optical and mechanical systems of a CD player could affect digital playback's musicality.

Recently, however, there has been a veritable explosion of interest in all manner of CD tweaks, opening a digital Pandora's box. An avalanche of CD tweak products (and the audiophile's embrace of them) has suddenly appeared in the past few months, Monster Cable's, AudioQuest's, and Euphonic Technology's CD Soundrings notwithstanding. Most of these tweaks would appear to border on voodoo, with no basis in scientific fact. Green marking pens, an automobile interior protectant, and an "optical impedance matching" fluid are just some of the products touted as producing musical nirvana. The popular media has even picked up on this phenomenon, sparked by Sam Tellig's Audio Anarchist column in Vol.13 No.2 describing the sonic benefits of applying Armor All, the automobile treatment, to a CD's surface. Print articles have appeared in the Los Angeles Times, Ice Magazine, and on television stations MTV, VH-1, and CNN, all reporting, with varying degrees of incredulity, the CD tweaking phenomenon.

The intensity of my interest in the subject was heightened by a product called "CD Stoplight," marketed by AudioPrism. CD Stoplight is a green paint applied to the outside edge of a CD (not the disc surface, but the 1.2mm disc thickness) that reportedly improves sound quality. I could not in my wildest imagination see how green paint on the disc edge could change, for better or worse, a CD's sound. However, trusting my ears as the definitive test, I compared treated to untreated discs and was flabbergasted. Soundstage depth increased, mids and highs were smoother with less grain, and the presentation became more musically involving.

Other listeners, to a person, have had similar impressions. Since I am somewhat familiar with the mechanisms by which data are retrieved from a CD (I worked in CD mastering for three years before joining Stereophile), this was perplexing: I could think of no plausible explanation for a difference in sonic quality. As we shall see, the light reflected from a CD striking the photo-detector contains all the information encoded on the disc (footnote 1). Even if CD Stoplight could somehow affect the light striking the photo-detector, how could this change make the soundstage deeper? I was simultaneously disturbed and encouraged by this experience. Disturbed because it illustrates our fundamental lack of understanding of digital audio's mysteries, and encouraged by the promise that identification of previously unexplored phenomena could improve digital audio to the point where today's digital audio era will be regarded as the stone age.

These events prompted me to conduct a scientific examination of several CD "sonic cure-all" devices and treatments. I wanted to find an objective, measurable phenomenon that explains the undeniable musical differences heard by many listeners where, at least according to established digital audio theory, no differences should exist. For this inquiry, I measured several digital-domain performance criteria on untreated CDs, and then on the same CDs treated with various CD tweaks. The parameters measured include data error rates, ability to correct (rather than conceal) data errors, and jitter.

The six CD treatments and devices chosen for this experiment include three that allegedly affect optical phenomena and three that ostensibly affect the CD player's mechanical performance. The three optical treatments tested are CD Stoplight (the green paint), Finyl (a liquid applied to a disc surface, that, according to its promoters, provides "optical impedance matching"), and Armor All. The mechanical devices include CD Soundrings, The Mod Squad's CD Damper disc, and the Arcici LaserBase, a vibration-absorbing CD-player platform. I also measured playback signal jitter in a mid-priced CD player and the $4000 Esoteric P2 transport (regarded as having superb sonics). However, this is not intended as a survey of the musical benefits of these devices and treatments. In addition, I looked at the variation in quality of discs made at various CD manufacturing facilities around the world.

Another purpose of the article is to dispel some common misconceptions about CD error correction and its effect on sonic quality. If one believes the promoters of some of these CD treatments, errors are the single biggest source of sonic degradation in digital audio. In reality, errors are the least of CD's problems. However, this has not prevented marketeers from exploiting the audiophile's errorphobia in an attempt to sell products.

For example, Digital Systems and Solutions, Inc., manufacturer of Finyl, claim in their white paper that error concealment "results in a serious degrading of playback fidelity." They also state that errors can get through undetected, leading to a litany of sonic horrors including: "poor articulation of bass and mid-bass notes, attenuation of dynamics and smearing of transients, increased noise with loss of inner detail and intertransient silence, reduced midrange presence that diminishes clarity and transparency, loss of image specificity and focus, reduction of the apparent width and depth of soundstage—virtually eliminating the possibility of holophonic [sic] imagery, decreased resolution of the low level detail that is so necessary to the recovery of hall ambience, altered instrumental and vocal timbres that lack coherence or cohesiveness, obscuring of vocal textures and expression, instrumental lines and musical themes are more difficult to sort out, complex rhythms and tempos are less easily followed, the music will not be as emotionally involving and satisfying an experience as might have otherwise been possible, subtle breath effects on brass or wind instruments are more difficult to discern as are nuances of fingering and bowing on string instruments." This list, they concede, "is not claimed to be complete."

Technical background
Encoding and data retrieval: Before getting into the measurement results, let's arm ourselves with a little technical background on how the CD works.

A CD's surface is covered by a single spiral track of alternating "pit" and "land" formations. These structures, which encode binary data, are created during the laser mastering process. The CD master disc is a glass substrate coated with a very thin layer of photosensitive material. The glass master is rotated on a turntable while exposed to a laser beam that is modulated (turned on and off) by the digital data we wish to record on the disc. This creates a spiral of exposed and unexposed areas of the disc. When the master is later put under a chemical developing solution, areas of the photosensitive material exposed to the recording laser beam are etched away, creating a pit. Unexposed areas are unaffected by the developing solution and are called lands. These formations, which are among the smallest manufactured structures, are transferred through the manufacturing process to mass-produced discs. Fig.1 is a scanning electron microscope of a CD surface. Note that a human hair is about the width of 50 tracks.

Fig.1 Scanning electron microscope of a CD surface.

The playback laser beam in the CD player is focused on these tiny pits and held on track by a servo system as the disc rotates. This beam is reflected from the disc to a photo-detector, a device that converts light into voltage. To distinguish between pit and land areas, the pit depth is one-quarter the wavelength of the playback laser beam. When laser light strikes a pit, a portion of the beam is reflected from the surrounding land, while some light is reflected from the pit bottom. Since the portion of light reflected from the pit bottom must travel a longer distance (1/4 wavelength down plus 1/4 wavelength back up), this portion of the beam is delayed by half a wavelength in relation to the beam reflected by the land. When these two beams combine, phase cancellation occurs, resulting in decreased output from the photo-detector. This variable-intensity beam thus contains all the information encoded on the disc.

Now that we understand how the playback beam/photo-detector can distinguish between pit and land, let's look at how these distinctions represent digital audio data. One may intuitively think that it would be logical for a pit to represent binary one and a land to represent binary zero, or vice versa. This method would certainly work, but a much more sophisticated scheme has been devised that is fundamental to the CD. It is called Eight-to-Fourteen Modulation, or EFM.

This encoding system elegantly solves a variety of data-retrieval functions. In EFM encoding, pit and land do not represent binary data directly. Instead, transitions from pit-to-land or land-to-pit represent binary one, while all other surfaces (land or pit bottom) represent binary zero. EFM encoding takes symbols of 8 bits and converts them into unique 14-bit words, creating a pattern in which binary ones are separated by a minimum of two zeros and a maximum of 10 zeros. The bit stream is thus given a specific pattern of ones and zeros that result in nine discrete pit or land lengths on the disc. The shortest pit or land length encodes three bits, while the longest encodes 11 bits. The blocks of 14 bits are linked by three "merging bits," resulting in an encoding ratio of 17:8. At first glance, it may seem odd that EFM encoding, in more than doubling the number of bits to be stored, can actually increase data density. But just this occurs: Storage density is increased by 25% over unmodulated encoding.

EFM has other inherent advantages. By inserting zeros between successive ones, the bandwidth of the signal reflected from the disc is decreased. The data rate from a CD is 4.3218 million bits per second (footnote 2), but the EFM signal has a bandwidth of only 720kHz. In addition, the EFM signal serves as a clock that, among other functions, controls the player's rotational servo.

The signal reflected from the disc is comprised of nine discrete frequencies, corresponding to the nine discrete pit or land lengths (footnote 3). The highest-frequency component, called "I3," is produced by the shortest pit or land length and has a frequency of 720kHz. This represents binary data 100. The lowest-frequency component, called "I11," is produced by the longest pit or land length and has a frequency of 193kHz. This represents binary data 10000000000. The signal reflected from the disc, produced by EFM encoding, is often called the HF (high frequency) signal. The varying periods of the sinewaves correspond to the periods of time required to read the various pit lengths.

At first impression, the HF signal appears to be analog, not one that carries digital data. However, the zero crossings of the waveforms contain the digital information encoded on the disc. Fig.2 shows the relationship between binary data, pit structure, and the recovered HF signal.

Fig.2 Relationship between binary data, pit structure, and the HF signal. (Reproduced from Principles of Digital Audio, Second Edition (1989), by Kenneth C. Pohlmann, with the permission of the publisher, Howard W. Sams & Company.)

HF signal quality is a direct function of pit shape, which in turn is affected by many factors during the CD manufacturing process. There is a direct correlation between error rates and pit shape. Poorly shaped pits result in a low-amplitude HF signal with poorly defined lines. Figs.3 and 4 show an excellent HF signal and a poor HF signal respectively.

Fig.3 A clean HF signal results from well-spaced pits.

Fig.4 A poor-quality HF signal.

CD data errors: Any digital storage medium is prone to data errors, and the CD is no exception. An error occurs when a binary one is mistakenly read as a binary zero (or vice versa), or when the data flow is momentarily interrupted. The latter, more common in CDs, is caused by manufacturing defects, surface scratches, and dirt or other foreign particles on the disc. Fortunately, the CD format incorporates extremely powerful error detection and correction codes that can completely correct a burst error of up to 4000 successive bits. The reconstructed data are identical to what was missing. This is called error correction. If the data loss exceeds the player's ability to correctly replace missing data, the player makes a best-guess estimate of the missing data and inserts this approximation into the data stream. This is called error concealment, or interpolation.

It is important to make the distinction between correction and concealment: correction is perfect and inaudible, while concealment has the potential for a momentary sonic degradation where the interpolation occurs.

A good general indication of disc quality (and the claimed error-reduction effects of some CD tweaks) is the Block Error Rate, or BLER. BLER is the number of blocks per second that contain errant data, before error correction. The raw data stream from a CD (called "channel bits") contains 7350 blocks per second, with a maximum allowable BLER (as specified by Philips) of 220. A disc with a BLER of 100 thus has 100 blocks out of 7350 with errant or missing data. In these experiments, Block Error Rate is the primary indicator of a particular tweak's effect on error-rate performance.

In addition to measuring the effects of CD tweaks on BLER, I explored their potential to reduce interpolations. To do this, I used the Pierre Verany test CD that has intentional dropouts in the spiral track. The disc has a sequence of tracks with increasingly long periods of missing data.

First, I found the track that was just above the threshold of producing an uncorrectable error (called an "E23 error") as analyzed by the Design Science CD Analyzer (see Sidebar). The track was played repeatedly to assure consistency, thus avoiding the ascription to chance of any subsequent change. Then, the same track was played and analyzed, this time after the addition of a CD treatment or device. This twofold approach—measuring a tweak's effect on both BLER and interpolations—would seem to cover the gamut of error-reduction potential.

There are two general misconceptions about CD errors and sound quality: 1) errors are the primary source of sonic degradation; and 2) if there are no uncorrectable errors, there can be no difference in sound.

The first conclusion is largely due to the marketing programs of CD-accessory manufacturers who claim their products reduce error rates. Many of the devices tested claim to improve sound quality by reducing the amount of error concealment performed by the CD player. In fact, interpolations (error concealment) rarely occur. In the unlikely event that concealment is performed, it will be momentary and thus have no effect on the overall sound. At worst, a transient tick or glitch would be audible.

To better understand the nature of data errors, a look at CD Read-Only Memory (CD-ROM) is useful. A CD-ROM is manufactured just like an audio CD, but contains computer data (text, graphics, application software, etc.) instead of music. The data retrieved from a CD-ROM must be absolutely accurate to the bit level, after error correction. If even a single wrong bit gets past the error correction, the entire program could crash. The errant bit may be within instructions for the host computer's microprocessor, causing the whole application to come to an instant halt, making the disc useless.

To prevent this, a quality-control procedure is routinely used at the mastering and pressing facility to assure 100% error-free performance. Samples of the finished CD-ROM are compared, bit for bit, to the original source data. For high-reliability applications, each replicated disc undergoes this process. This rigorous testing reveals much about the error-correction ability of the CD's Cross Interleaved Reed-Solomon encoding (CIRC). Throughout dozens of hours of this verification procedure, I cannot remember even a single instance of one wrong bit getting through.

It could be argued that CD-ROM has additional error-correction ability not found on CD audio discs. This is true, but the additional layer of error correction is almost never invoked. Furthermore, in all the hours of error-rate measuring for this project, I never encountered an E23 error, the first and most sensitive indication of an interpolation (except on the Pierre Verany disc, which has intentional errors). In fact, I saw only one E22 error, the last stage of correction before concealment. In retesting the disc, the E22 error disappeared, indicating it was probably due to a piece of dirt on the disc. Finally, the unlikely occurrence of an uncorrectable error is exemplified by the warning system in the Design Science CD Analyzer. The system beeps and changes the computer's display color to red to alert the operator if even an E22 error (fully corrected) is detected.

http://www.stereophile.com/reference/590jitter/


The Absolute Sound of What?

One of the things that distinguishes a dedicated audiophile from Joe Q. Public is that he has some notion of what audio fidelity is all about.

The typical buyer of a "steeryo" is seeking nothing more than pleasant or exciting sounds, and is easily satisfied because he has no greater expectation of audio than this. The audiophile, however, is aware that reproduced sound can resemble (more or less) real, live sound, and he is driven in a continual search for that ultimate truth ("fidelity to the original") even while realizing, intellectually at least, that it is unattainable.

Because he understands what the word "reproduction" means, the audiophile thinks in terms of a relationship to an original sound. This original is, of course, the sound of live music, and the touchstone for its reproduction is accuracy. Unfortunately, though, we don't really compare the reproduction with the real thing—because we can't. Only a recording engineer can saunter back and forth between the real thing (which takes place in a studio or hall) and the reproduction of it (in the control room with its monitor system). We audiophiles must be content to compare the reproduction with what we remember to be the sound of live music. Even the amateur recordist must carry the memory of that original sound home with his tapes in order to evaluate them.

And that memory may not serve us that well. Few of us have learned to listen with enough attention and skill to be able to break live sound down into its components and to observe what each sounds like. Most of us remember only an overall impression—the gestalt of the thing. And many of us must admit, to ourselves at least, that we have not heard live music for years or, worse, never at all. For the vast majority of audiophiles then, the reference standard is not the absolute sound of live music, but an imagined ideal—a mental picture of how we remember its having sounded or how we would like it to sound. At this point, accuracy becomes a dubious criterion because of the vagueness of the original to which we compare the copy. System evaluation becomes a (simple?) matter of "it's good if it sounds good."

The problem with this is that one man's good is another man's distortion. Different people listen to and assign different orders of importance to different aspects of reproduced sound. Thus, while two very picky listeners may agree that a system has good bass, good highs, and a colored middle range, they will disagree as to how good the system is if one happens to be critical of highs and lows while the other is critical of the middle range.

In short, we really don't have any way of reliably assessing the accuracy of reproduced sound. Even a recording engineer cannot be confident of the sound of his own recording, because what he hears in the control room depends on his monitoring equipment, which is no more—and is often less—accurate than a home system. (Many pros do not, in fact, aim for realism at all, but for what they call a "commercial sound"—one that will sell. Thus a recording may not even have the potential for sounding realistic.)

All this does not, however, discourage audiophiles in their search for the Holy Grail of musical accuracy. There are a couple of approaches from which to select. The casual audiophile, who has more interest in music than the ultimate in fi, will usually choose a record label whose releases he favors for their musical values, and will tailor his system to sound best with most of that label's recordings. Discs from other labels may sound good on this system too, but it will be a matter of luck, and bear little relationship to accuracy.

Perfectionist audiophiles, on the other hand, usually aim for maximum accuracy in the playback system itself. The idea here is that, if the system accurately reproduces what is on the recording, the best recordings will yield the most natural sound. (This philosophy has the added benefit of rewarding those record manufacturers who strive hardest for realism.)

This seems like an elegantly simple solution, but there's a flaw. In order to ascertain the accuracy of a disc's reproduction, we must have an original to compare it to. But we can't compare it to the sound that was fed into the master-tape recorder, because that sound was gone forever when the recording session ended. The closest we can get to that original signal is the one that comes from the recorder when the tape is played back. That, after all, is the signal that was used to cut the disc, and if the disc sounds the same as the tape, then we know our record-playing system (the arm, cartridge, and preamp) is accurate. Right? Not necessarily—the record cutting and pressing system was optimized based on a comparison to the original sound, but with probably a completely different phono system than the one you use at home.

Before approving a new release, a record producer is sent a test pressing of it (footnote 1), which he then plays through his reference system and compares with what he hears directly from the master tape. If they don't sound alike, he tells the cutting engineer to make appropriate equalization corrections for the final release cut, or to simply re-cut the disc with the same equalization.

Wouldn't this ensure that his disc sounds like the original tape? Not quite, because it is more than likely that his phono system and preamp have significant colorations, which will make the disc sound different from the way it "actually" sounds. Why, then, should our perfectionist record producer trust his playback system? Because he carefully chose it to make his records sound as much as possible like his tapes!

We've all heard of Catch 22, but in case you're unsure of its meaning, it is about circularity—in reasoning, causality, and Ultimate Truth. Circularity exists when A is a function of B, while B is determined by A. A popular example of circularity is the chicken-or-the-egg question. Then there's the apocryphal "Timbuktu Paradox," which relates the story of the retired sea captain who fires a cannon every day at the precise moment the town hall clock says 12 noon, while the town-hall custodian checks his clock every day by the sound of the 12 o'clock cannon.

What in fact does a record sound like? Think for a moment before answering. It has no sound at all. Hold one up to your ear, and what do you hear? Nothing, of course. To hear what's "on" a recording, you have to reproduce it through a phono system. And what does that phono system really sound like? It sounds like the record with various things added or subtracted. And the music goes 'round and 'round..

There really isn't any way of knowing precisely what is the sound of a record or its player. This is one reason why, in this age of high technology, audio continues to be such a cabalistic field. Where knowledge fails, mysticism moves in.

But just because we cannot make absolute assessments of disc-reproduction accuracy doesn't mean we should abandon the accuracy criterion altogether, any more than we should all stop trying to be good people just because we can't be perfect. There is, in fact, a way we can get reasonably close to the ultimate truth about an analog disc and its player, and that way believe it or not is through the Compact Disc.

he CD has all along been touted as an absolutely accurate recording/playback medium, no doubt to the embarrassment of those manufacturers who so promoted it. Even the mass circulation hi-fi magazines have been reporting that some players sound better than others, and that the best are getting better as time goes on. But another question that has assumed growing importance is just how good the Compact Disc system actually is, because the answer to that question will determine how far the CD can go towards needs of the audiophile who cares about accuracy (footnote 2).

Numbers of audiophile-oriented record manufacturers have been claiming that the CD sound is "virtually indistinguishable from" the sound of the original master tapes. Even allowing for a certain amount of hyperbole (footnote 3), this would seem to indicate that a CD may offer us the most direct path back to the sound of the original master recording. But how much does a CD sound like its master?

To my knowledge, the only investigation of this was done a couple of years ago by England's Hi-Fi News & Record Review. Those listening tests involved direct comparisons between the sound of some Decca CDs and their digital master tapes. The test results were not felt to be entirely conclusive. While there was agreement that the CDs sounded pretty much like the original masters, there was some disagreement as to how important were the minor differences noted.

HFN/RR's experiment is already outdated anyway. Since that time the audio quality of the best CD players has improved dramatically, while many of professional recorders have stayed the same. And the conditions of HFN/RR's tests were not quite the same as an analog disc/tape comparison, because a set of spurious electronics were introduced into the "original" signal: the digital recorder's playback circuitry.

When mastering from analog, the original signal—that is, the signal feeding the cutting system—is already in analog form and can be auditioned directly. But in CD mastering, the original is in digital form, and stays that way up until the time the disc is played in your home. In order to compare the original (digital) with the playback (analog), D/A conversion must occur at the output of the recording deck. And there's the catch. That D/A converter and audio section was not present in the chain that delivered the original signal to the CD. In other words, when we make a CD/master-tape comparison, the "original" sound is being processed by electronics which are different from those used for the CD playback, and the former may not be as good as the latter.

Professional recording equipment is notorious for having less than perfectionist-quality audio circuitry and parts. That's why every recording studio that aims for the best sound customizes its tape decks. Some consumer CD players, such as the Meridian and Mission units, probably produce a better sound from CD digital than do the decks used to master those CDs. So it is more than likely that, if HFN/RR were to repeat those tests today, the CD sound would emerge as the clearcut winner, and would actually be judged better than the "original tape."

Under the circumstances, though, it is likely that such comparisons between the master and the consumer product are more reliable for digital recordings than for analog ones, because there are no mechanical transducers involved. Bad electronics can do some nasty things to digital sound, but they tend to have relatively little effect on the spectral balance of the sound—the balance between bass and treble, and the absolute high-end content. Thus, while we may still quibble over other aspects of CD sound, there is little doubt but that what we hear from a CD is much closer in spectral balance to the master tape than what we hear from an analog reproduction of the same recording.

This is why I adopted CD as my "standard" for judging most aspects of the sound of analog signal sources. Where CDs contrast consistently with what I hear from analog, I assume (on faith, you might say) that the CD sound is closer to the truth in spectral balance and low-frequency quality. If that CD sound is not "good," I do not assume that the better analog sound is "right." Instead, I adjust the other components in my system—the loudspeakers, in particular—until the sound I hear from CD in the listening room is as close as possible to what I remember of live music. This then becomes my standard for evaluating analog sources. The sound I get from analog is of a very high standard, and it has very similar spectral balance to digital sources.

The CD is still not what I consider to be anything like an "absolute" standard, but I do believe it is the closest approach to such an absolute that we're likely to find. It's certainly better than wondering whether the lovely sounds I get from some analog discs are the result of almost-perfect everythings in the chain, or merely a fortuitous mating between gross system colorations all the way from the microphones to the loudspeakers.

http://www.stereophile.com/asweseeit/363/index1.html


48 vs. 96 kHz

We can probably hear the difference between 48 and 96 kHz sampling in a quiet, modern studio, but it is difficult to say whether record buyers can. And are 24/96 converters real anyway? Yes, but they are difficult to do well.

You may be able to get 24 or so bits to wiggle 96,000 times per second, but that doesn ’t mean that the data itself carries any additional real information. Clock jitter is more difficult to deal with, for example, and noise levels in the analog stages – more than the digital circuitry – reduce the actual achievable dynamic range to well below the theoretical 144 dB. But switch a converter between 44.1 and 88.2 kHz sampling and see what you think. However, converters with higher sampling rates can still sound better. Let’s see why.

Filters
The usual rationale given for the need for higher sample rates involves anti-imaging and anti-aliasing filters.These are required to insure that no audio information above the Nyquist limit (half the sample rate) passes through the system.You need at least two digital samples per analog cycle to accurately quantize a waveform, and if you have only one (for example, because the waveform is at a higher frequency than the Nyquist limit), then you will sample a value, but it will be meaningless.

A digital sample captured by an A/D converter is like a single frame of action captured by a movie camera. The faster the samples are taken – the higher the sample rate – the more information is acquired. In film, a higher rate makes the action more smooth and fluid – although above a certain speed we can’t really notice the difference. In digital audio, the faster the sample rate, the better the frequency response – although above a certain point, we probably can’t hear the difference.

Imagine, using the film analogy, that we make a movie of a flashing light (an independent art movie, evidently). We make the light flash on and off faster and faster as time goes by. At some point it will be flashing at half the film speed – say the film is running at 24 frames per second and the light is flashing 12 times per second. If we look at the individual film frames at this point we will see that one frame shows the light on, and the next shows the light off. The next frame, it's on again. This is fine. But now imagine the light has sped up so that it flashes 24 times per second. Now, each time a frame is shot, the light is on. Or maybe off. Ooops – if we look at a succession of frames we will see that the light seems to be on or off all the time, depending on what part of the cycle the light was in when the shot was taken – which is not what the light is really doing.

Evidently, the film record is meaningless.

If you try sampling a waveform whose frequency is the same as the sample rate, the waveform behaves as if its frequency was 0 Hz! In more general terms, if you sample frequencies higher than the Nyquist limit, they behave like a mirror image of the frequencies below the limit. So if you’re sampling at 44.1 kHz, 30 kHz tone sounds the same as one at 14 kHz!

This is obviously undesirable, and as a result, digital systems from time immemorial (well, the last 30 years, anyway) have included anti-imaging and anti-aliasing filters to stop this very problem.

Unfortunately, these filters have traditionally sounded horrible.They need to pass all frequencies up to as close to the Nyquist limit as possible, but not a bit more. This makes their rolloffs very steep, and if implemented in the analog domain, as early ones were, they introduced enormous phase shifts into the audio – 1000 degrees out at 10 kHz was not uncommon. No wonder, then, that the top end sounded clangy and harsh and people said digital would never catch on. Improvements in filter design, specifically ones that used more gentle slopes, significantly improved the imaging and, consequently, the sound of digital audio.

A more complete solution would be to sample at higher rate, so the Nyquist limit would be well out of the way, and thus the filters could be smoother and operate way above anything audible. Sounds good in theory, and this may be where the original push for higher sample rates came from. We would probably have used higher sample rates earlier, had it been feasible outside the laboratory.

Today, however, such filters are implemented digitally, and phase errors aren’t such a problem. In addition, the type of converters used today multiply the effective sample rate internally, so that the apparent Nyquist limit is much higher than half the “real” sample rate. As a result of this “oversampling” technique, the filter rolloffs and frequencies can be kept well out of the way of the audio. So there’s no need for higher sample rates. After all, there ’s nothing higher than around 20 kHz to record. Is there...?

Why Record Ultrasonics?
As is widely recognized, most of us can ’t hear much above 18 kHz, but that does not mean that there isn’t anything up there that we need to record – and here's another reason for higher sampling rates. Plenty of acoustic instruments produce usable output up to around the 30 kHz mark – something that would be picked up in some form by a decent 30 in/s half-inch analog recording. A string section, for example, could well produce some significant ultrasonic energy.

Arguably, the ultrasonic content of all those instruments blends together to produce audible beat frequencies which contribute to the overall timbre of the sound. If you record your string section at a distance with a stereo pair, for example, all those interactions will have taken place in the air before your microphones ever capture the sound.You can record such a signal with 44.1 kHz sampling and never worry about losing anything –as long as your filters are of good quality and you have enough bits.

If, however, you recorded a string section with a couple of 48-track digital machines, mic on each instrument feeding its own track so that you can mix it all later, your close-mic technique does not pick up any interactions.The only time they can happen is when you mix – by which time the ultrasonic stuff has all been knocked off by your 48 kHz multitrack recorders, so that will never happen. It would thus seem that high sampling rates allow the flexibility of using different mic techniques with better results.

Pick A Number
Having established that higher sampling rates are a good idea – or at least a fact of modern life – there is question as to what the sample rate should actually be in studio environment. On the face of it, 96kHz takes care of capturing any audio that might ever happen, and 24 bits offer quite enough quantization steps. Is that enough?

Yes, in theory – more than enough. But there are some potential problems, real or imaginary, to having a production environment that has no better resolution than the consumer distribution format, and the emerging DVD-Audio standard offers not just 24-bit, 96kHz sampling: It even goes beyond that to support 192 kHz sampling in stereo.

[On the face of it this is quite absurd. Do we need to capture “audio” signals at up to 96 kHz? Obviously not – such signals don ’t exist. However, some recent research suggests that the human brain can discern a difference in a sound's arrival time between the two ears of better than 15 microseconds – around the time between samples at 96 kHz sampling – and some people can even discern a 5µS difference! So while super-high sample rates are probably unnecessary for frequency response, they may be justified for stereo and surround imaging accuracy. However, it should be noted that many authorities dispute this conclusion.]

Think of higher sample rates and longer word lengths as a kind of “headroom.” We need higher resolution in the studio than consumers so we can start with a higher level of quality in case some gets lost on the way – which might well happen.

And what happens when you modify a digital signal in the digital domain, say by EQing it, or fading it out? You create more bits – more data.You ought to have spare bits so you have room to work.You can always lose resolution, but you can’t easily get it back again.

http://www.digitalprosound.com/Htm/SoapBox/soap2_Apogee.htm