Last month we examined the CD and the basics of PCM audio. This month we continue our discussion with Florian Cossy and Thierry Heeb of Anagram Technologies.
Jeff Fritz: You've explained the physical structure of CD as a digital storage device, and PCM as a method of encoding and reading audio content on the CD. Can you give us an idea as to the physical differences between the CD and DVD as a storage device? I think many audiophiles mistakenly lump the storage device together with the encoding method, because both DVD-Audio and SACD use the DVD as their physical carrier.
Thierry Heeb: Both CD and DVD are optical discs for data storage. Basically, the maximum capacity (in terms of data) that can be stored on a CD is 650MB. "B" stands for byte, which is a group of eight bits. (A byte is something naturally referred to in the computer world). The "M" stands for mega, which is the abbreviation for 106, like Kilo (often noted "k") stands for 103 and comes from ancient Greek "kilos" meaning 1000. But remember that computers are binary beings. As such, they know how to manipulate numbers, which are a power of 2, very easily. That’s why in the computer (and generally in the digital) world, a KB (or kilo-Byte) is indeed not 1000 bytes but 1024 bytes (because 1024 = 210). In the same way a MB is in fact 1024 kilobytes or 1024 x 1024 bytes.
On the other hand, a DVD can store up to about 4.37GB (one GB = 1024MB!). This is about seven times more than a CD. DVDs can also be multi-layered and/or multi-sided. In this case, capacity can grow up to more than 17GB (double sided, double layered). Basically the DVD is just much more finely pitched than the CD. There are of course other differences between the two, like the wavelength of the laser used to pick up the data. As the data pitches are much smaller on a DVD, the wavelength of the laser must be smaller or else it wouldn’t see the smaller bumps or holes on the surface of the disc.
One very important thing to remember is that CD and DVD are simply data-storage devices. Their physical structure is the same, whatever the data you are writing to them. As an example, you can take a standard CD-R and either write computer data (files) to it or you can write audio to it (CD audio). While the physical device is the same, the way data is formatted (i.e., organized on the disc) will differ. The same applies to DVD. There are DVD-ROMs, DVD-A, DVD-V, and SACD (more on this later)! All these discs share the same physical properties (at manufacturing).
Then, what changes is how the data is formatted on the disc. For instance, for DVD-V you will have a large video data section and a small audio section; on DVD-A on the other hand, more space is allocated to audio and only a small amount of space is allocated for still images.
Then there is SACD. SACD uses the DVD disc as its physical carrier. What differs is, again, how the data is organized on the disc and what these data represent. The big difference is that audio is stored in DSD format, not in PCM as on DVD-V or DVD-A. (For more information on DSD please refer to the last question.)
To give an illustration of all these concepts an analogy with a book is a very good starting point:
Imagine an editor, editing books in two languages but using the same layout (e.g., table of contents, prologue, chapters, epilogue).
Books DVDs
Book format (physical) --> Disc (physical)
Book layout (not language, nor content dependent) --> File system (how to access the information)
Book subject --> Audio, ROM, or Videodisc
Book chapters Tracks
Book language --> Data encoding (PCM for DVD-A, DVD-V or DSD for SACD)
This analogy illustrates that even though SACD and DVD share the same physical support and file system they are not written in the same language. Therefore, they cannot be understood by a reader from another language! So basically the design of a universal player is not related to the data pickup mechanism but to the design of software able to understand the data present on the disc.
JF: From a mathematical standpoint, how does the extra storage capacity on DVD equate to the theoretical maximum resolution?
TH: A normal DVD can hold about 4.37GB per layer (DVD discs with up to 17GB of capacity do exist). This is about seven times more than a CD. Thus, the extra storage capacity can hold much more music than a CD. The following table shows how much music can be stored on one layer of the DVD-A (the last column refers to the additional use of MLP in order to compress audio in a lossless way).
Bit Rate Sampling Frequency Channels Uncompressed MLP Compression
24-Bit 96kHz 5.1 N/A * 100 Minutes
24-Bit 96kHz 6 N/A * 86 Minutes
24-Bit 96kHz 2 150 Minutes 240 Minutes
24-Bit 192kHz 2 75 Minutes 120 Minutes
16-Bit 44.1kHz 2 420 Minutes 720 Minutes
16-Bit 44.1kHz 1 840 Minutes 1500 Minutes
As one can see, this is considerably more than CD. Now the maximum resolution available on DVD is not really related to the DVD itself but rather to the data format used. Theoretically you could imagine for instance storing 32- or even 64-bit audio on the DVD. The same applies to CD. Imagine you are using your CD not as an audio device but as a CD-ROM. In this case nothing prevents you from storing 64-bit audio on your CD. The only price you’d have to pay for this is reduced duration of the stored music.
JF: As you mentioned, DVD-Audio uses Meridian Lossless Packing (MLP) for audio encoding. Can you give us a synopsis of the process?
TH: First of all it is important to understand the difference between a lossy and a lossless coder. A lossy coder is a coder like MP3, AC-3, or even DTS. If you compare an original bitstream with the output of the encoder/decoder process, you would notice that the two bitstreams are not bit-by-bit equal. From an information-theory point of view, there has been loss of information (the two streams are not identical). These coders rely on psycho-acoustical principles that allow one to remove information from the datastream. The information removed will be the information that is less noticeable to our hearing system. Techniques like time-domain masking and others are used to reduce the amount of significant-frequency-domain components, and thus the data rate. So all these lossy coders indeed remove some information in the frequency domain, which means only the reduced data set is stored on the disc. At playback, the decoder reconstructs a time-domain model of the signal. Without a doubt, this reconstructed signal will not be identical to the original in the sense of information theory, but from a human-hearing point of view they should at least be very close. Good lossy codecs can achieve data-reduction rates of 1:10 or more (AC-3, MP3) with not much noticeable loss of detail, thanks to our brain, which helps reconstruct what seems to be missing!
On the other hand you have lossless coders. Lossless coders are coders where the input and the output of the encode/decode process are strictly equal. One good example of a lossless coder is WinZip! We can illustrate the difference between a lossy and a lossless coder by an analogy to the computer. Imagine you have a very high-resolution picture on screen in bitmap format. Now imagine that you pass it through WinZip and save it to your disk. Clearly there will be some reduction when comparing the original file size and the output of WinZip. Data has been reduced. If you now unzip your file and reload the image, they will be equal, pixel-by-pixel, bit-by-bit. Now imagine that you compress your image using a low-resolution JPEG and save it to your disk. If you now reopen this latter file and compare it to the original image, there will be some loss of detail. Now if you sit back from the screen far enough, the original and the JPEG image will look the same; it’s only in the details that they change. So there you have a good example of a lossless (WinZip) and a lossy coder (low-resolution JPEG).
The theory of lossless coders is again related to information theory. It has been shown that the maximum compression ratio of a lossless coder is 1:2 (in theory). This means that from a theoretical point of view it is not possible to build a lossless coder which can compress data to a factor larger than 2. This does not mean that there doesn’t exist datastreams where the compression ratio would be better than 1:2, but only that you cannot guarantee compression greater than 1:2 on arbitrary data. Going into the theory of operation of MLP would probably go far beyond the scope of the intended article, but the MLP process relies on the following techniques:
* Bit Shifting - avoids wasting bits for unused dynamic range
* Matrixing - puts the audio common to multiple channels into one channel
* Prediction Filters - predicts the next bit of audio based on the previous audio
* FIFO Buffer - smoothes the instantaneous data rate
* Entropy Coding - compresses the final data as tight as possible
JF: Same question except SACD and its use of Sony's DSD process.
TH: This is quite a tough question! First of all, SACD information is not really stored in DSD format, but in DST (this is not a typo!). DST is a lossless (see information on MLP above) coder that reduces the data amount to be stored/read to/from the disc. DST allows one to reduce the data rate such as to be able to transmit six channels of high-resolution audio. As DST is a lossless coder; there should theoretically be no difference in the DSD-DST-DSD coding/decoding process.
DSD is just another way to represent audio data when compared to PCM. Unlike PCM, which acts on quite big data words (16 to 24 bits), but at low sampling rates (up to 192kHz), DSD uses the smallest data word possible (1 bit) but at a very high sampling rate (2.8224MHz = 2822.4kHz). That is, a DSD signal can be seen as a sequence of 0s and 1s. The amplitude of the signal is then proportional to the density of 1s in a given time frame. This is why DSD is sometimes referred to as pulse density modulation (the 1s being the pulses).
The question is this: How do I get from an analog signal to a DSD sequence of 0s and 1s? To do this a large amount of an audiophile’s nightmare is used: feedback. Yes, DSD is a technology using a tremendous amount of feedback to work properly. The feedback is provided by Delta-Sigma modulation (something which can be shown to be equivalent to noise shaping).
Delta-Sigma Modulator (simplified):
The Delta-Sigma modulator works as follows: The difference between the input signal and the feedback is integrated multiple times and the sign of the result of this integration is the output value of the 1-bit quantizer (for instance a positive value would be coded with 1 and a negative value would be coded with 0). The integrators move the error out of the lower part of the spectrum, providing noise shaping of the input signal. Noise shaping can be described as moving noise from one part of the spectrum to another. In the DSD application, noise that would be induced by the rough 1-bit quantizer is moved out of the lower part of the spectrum to the higher part:
The blue line represents the spectrum of the original signal. When passed through the DSD modulator the resulting spectrum is the one of the original signal (blue) plus the noise introduced by the Delta-Sigma modulator in red. One can see that DSD introduces a lot of high-frequency noise. This high-frequency noise must be removed. This implies the use of an analog low-pass filter (green) usually tuned at around 50kHz to 60kHz. If not properly designed, this filter can introduce phase rotations in the audio band.
Just a few remarks to conclude: If we have to use a low-pass filter at around 50kHz, the claimed 100kHz bandwidth of SACD will never be reproduced at the output. SACD uses a lot of feedback, so audiophiles who are allergic to this should refrain from using SACD, or change their view about feedback!
Concluding, conclusions...
As I conclude this series of articles, I’m struck by a few relevant and interesting details regarding the SACD and DVD-A formats. This relates, in no small part, to the way both formats were marketed and the effect that marketing had on audiophiles as they filtered into either camp (causing some contentious debates). Bluntly stated, there are some striking similarities between DVD-A and SACD. The most obvious fact is that both use the Digital Versatile Disc (DVD) as their storage device. That’s right, a DVD-A disc and an SACD are both DVDs.
Concluding how clearly superior the DVD is to the CD from a storage-capacity perspective, it’s easy to see why the DVD itself is the chosen carrier for whatever audio format is favored by the record labels or artists. In fact, the DVD is strikingly flexible: Dolby Digital, DTS, DVD-Video, PCM audio, MLP, DSD, and MP3 -- they’re all stored, or can be, on DVD. So whether DVD-Audio or SACD wins out (I’m increasingly of the opinion that both will survive to some extent), it can be concluded that the DVD as a storage device will reign for a long time. Who knows -- another format might prove superior to both DVD-A and SACD, and still be stored on the DVD!
We can also conclude that in some form, the DVD will replace the CD eventually. Now this doesn’t mean SACD or DVD-Audio per se, but just due to the increased storage capacity it makes complete sense. Picture the same content found on your CDs, along with some extras like concert videos and interviews, lining the shelves of every music store. Some of them might just contain good ol' two-channel audio, but many will have Dolby Digital 5.1, DTS 5.1, SACD, or DVD-Audio. Universal players for the car will exist to play back all of these, and auto-based video capability will grow by leaps and bounds. Quite easy to imagine, huh? I guess the one thing we can hang our hat on is the "versatile" aspect of DVD. They got that part right.
I’ve said many times that I don’t much care whether SACD or DVD-A "wins." What I do care about is that we have high-resolution multichannel music, and that means a way to get it into our homes. Whatever happens, that technology is here in several forms today. The revolution is clearly underway.
http://www.soundstage.com/surrounded/surrounded200305.htm