Audio bit depth

For other uses of "8-bit music", see chiptune.

An analogue signal (in red) encoded to 4-bit PCM digital samples (in blue); the bit depth is four, so each sample's amplitude is one of 16 possible values.

In digital audio using pulse-code modulation (PCM), bit depth is the number of bits of information in each sample, and it directly corresponds to the resolution of each sample. Examples of bit depth include Compact Disc Digital Audio, which uses 16 bits per sample, and DVD-Audio and Blu-ray Disc which can support up to 24 bits per sample.

In basic implementations, variations in bit depth primarily affect the noise level from quantization error—thus the signal-to-noise ratio (SNR) and dynamic range. However, techniques such as dithering, noise shaping and oversampling mitigate these effects without changing the bit depth. Bit depth also affects bit rate and file size.

Bit depth is only meaningful in reference to a PCM digital signal. Non-PCM formats, such as lossy compression formats, do not have associated bit depths. For example, in MP3, quantization is performed on PCM samples that have been transformed into the frequency domain.

Binary resolution

A PCM signal is a sequence of digital audio samples containing the data providing the necessary information to reconstruct the original analog signal. Each sample represents the amplitude of the signal at a specific point in time, and the samples are uniformly spaced in time. The amplitude is the only information explicitly stored in the sample, and it is typically stored as either an integer or a floating point number, encoded as a binary number with a fixed number of digits: the sample's bit depth.

The resolution of binary integers increases exponentially as the word length increases. Adding one bit doubles the resolution, adding two quadruples it and so on. The number of possible values that can be represented by an integer bit depth can be calculated by using 2ⁿ, where n is the bit depth.^[1] Thus, a 16-bit system has a resolution of 65,536 (2¹⁶) possible values.

PCM audio data is typically stored as signed numbers in two's complement format.^[2]

Floating point

Many audio file formats and digital audio workstations (DAWs) now support PCM formats with samples represented by floating point numbers.^[3]^[4]^[5]^[6] Both the WAV file format and the AIFF file format support floating point representations.^[7]^[8]

Unlike integers, whose bit pattern is a single series of bits, a floating point number is instead composed of separate fields whose mathematical relation forms a number. The most common standard is IEEE floating point which is composed of three bit patterns: a sign bit which represents whether the number is positive or negative, an exponent and a mantissa which is raised by the exponent. The mantissa is expressed as a binary fraction in IEEE base-two floating point formats.^[9]

Quantization

The bit depth limits the signal-to-noise ratio (SNR) of the reconstructed signal to a maximum level determined by quantization error. The bit depth has no impact on the frequency response, which is constrained by the sample rate.

Quantization noise is a model of quantization error introduced by the sampling process during analog-to-digital conversion (ADC). It is a rounding error between the analog input voltage to the ADC and the output digitized value. The noise is nonlinear and signal-dependent.

An 8-bit binary number (149 in decimal), with the LSB highlighted

In an ideal ADC, where the quantization error is uniformly distributed between $\scriptstyle{\pm \frac{1}{2}}$ least significant bit (LSB) and where the signal has a uniform distribution covering all quantization levels, the signal-to-quantization-noise ratio (SQNR) can be calculated from

\mathrm {SQNR} =20\log _{10}(2^{Q})\approx 6.02\cdot Q\ \mathrm {dB} \,\!

where Q is the number of quantization bits and the result is measured in decibels (dB).^[10]^[11]

24-bit digital audio has a theoretical maximum SNR of 144 dB, compared to 96 dB for 16-bit; however, as of 2007 digital audio converter technology is limited to a SNR of about 123 dB^[12]^[13]^[14] (21-bit ENOB) because of real-world limitations in integrated circuit design. Still, this approximately matches the performance of the human auditory system.^[15]^[16] (While 32-bit converters exist, they are purely for marketing purposes and provide no practical benefit over 24-bit converters; the extra bits are either zero or encode only noise.)^[17]^[18]

Signal-to-noise ratio and resolution of bit depths
# bits	SNR	Possible integer values (per sample)	Base-ten signed range (per sample)
4	24.08 dB	16	−8 to +7
8	48.16 dB	256	−128 to +127
11	66.22 dB	2048	−1024 to +1023
16	96.33 dB	65,536	−32,768 to +32,767
20	120.41 dB	1,048,576	−524,288 to +524,287
24	144.49 dB	16,777,216	−8,388,608 to +8,388,607
32	192.66 dB	4,294,967,296	−2,147,483,648 to +2,147,483,647
48	288.99 dB	281,474,976,710,656	−140,737,488,355,328 to +140,737,488,355,327
64	385.32 dB	18,446,744,073,709,551,616	−9,223,372,036,854,775,808 to +9,223,372,036,854,775,807

The resolution of floating point samples is less straightforward than integer samples, but the benefit comes in the increased accuracy of low values. In floating point representation, the space between any two adjacent values is of the same proportion as the space between any other two adjacent values, whereas in an integer representation, the space between adjacent values gets larger in proportion to low-level signals. This greatly increases the SNR because the accuracy of a high-level signal will be the same as the accuracy of an identical signal at a lower level.^[19]

The trade-off between floating point and integers is that the space between large floating point values is greater than the space between large integer values of the same bit depth. Rounding a large floating point number results in a greater error than rounding a small floating point number whereas rounding an integer number will always result in the same level of error. In other words, integers have round-off that is uniform, always rounding the LSB to 0 or 1, and floating point has SNR that is uniform, the quantization noise level is always of a certain proportion to the signal level.^[19] A floating point noise floor will rise as the signal rises and fall as the signal falls, resulting in audible variance if the bit depth is low enough.^[20]

Audio processing

Most processing operations on digital audio involve requantization of samples, and thus introduce additional rounding error analogous to the original quantization error introduced during analog to digital conversion. To prevent rounding error larger than the implicit error during ADC, calculations during processing must be performed at higher precisions than the input samples.^[21]

Digital signal processing (DSP) operations can be performed in either fixed point or floating point precision. In either case, the precision of each operation is determined by the precision of the hardware operations used to perform each step of the processing and not the resolution of the input data. For example, on x86 processors, floating point operations are performed at 32- or 64-bit precision and fixed point operations at 16-, 32- or 64-bit resolution. Consequently, all processing performed on Intel-based hardware will be performed at 16-, 32- or 64-bit integer precision, or 32- or 64-bit floating point precision regardless of the source format. However, if memory is at a premium, software may still choose to output lower resolution 16- or 24-bit audio after higher precision processing.

Fixed point digital signal processors often support unusual word sizes and precisions in order to support specific signal resolutions. For example, the Motorola 56000 DSP chip uses 24-bit word sizes, 24-bit multipliers and 56-bit accumulators to perform multiply-accumulate operations on two 24-bit samples without overflow or rounding.^[22] On devices that do not support large accumulators, fixed point operations may be implicitly rounded, reducing precision to below that of the input samples.

Errors compound through multiple stages of DSP at a rate that depends on the operations being performed. For uncorrelated processing steps on audio data without a DC offset, errors are assumed to be random with zero mean. Under this assumption, the standard deviation of the distribution represents the error signal, and quantization error scales with the square root of the number of operations.^[23] High levels of precision are necessary for algorithms that involve repeated processing, such as convolution.^[21] High levels of precision are also necessary in recursive algorithms, such as infinite impulse response (IIR) filters.^[24] In the particular case of IIR filters, rounding error can degrade frequency response and cause instability.^[21]

Dither

Main article: Dither

The noise introduced by quantization error, including rounding errors and loss of precision introduced during audio processing, can be mitigated by adding a small amount of random noise, called dither, to the signal before quantizing. Dithering eliminates the granularity of quantization error, giving very low distortion, but at the expense of a slightly raised noise floor. Measured using ITU-R 468 noise weighting, this is about 66 dB below alignment level, or 84 dB below digital full scale, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence in 16-bit audio (see programme level for more on this).

24-bit audio does not require dithering, as the noise level of the digital converter is always louder than the required level of any dither that might be applied. 24-bit audio could theoretically encode 144 dB of dynamic range, but based on manufacturer's datasheets no ADCs exist that can provide higher than ~125 dB.^[25]

Dither can also be used to increase the effective dynamic range. The perceived dynamic range of 16-bit audio can be 120 dB or more with noise-shaped dither, taking advantage of the frequency response of the human ear.^[26]^[27]

Dynamic range

Dynamic range is the difference between the largest and smallest signal a system can record or reproduce. Without dither, the dynamic range correlates to the quantization noise floor. For example, 16-bit integer resolution allows for a dynamic range of about 96 dB.

Using higher bit depths during studio recording accommodates greater dynamic range. If the signal's dynamic range is lower than that allowed by the bit depth, the recording has headroom, and the higher the bit depth, the more headroom that's available. This reduces the risk of clipping without encountering quantization errors at low volumes.

With the proper application of dither, digital systems can reproduce signals with levels lower than their resolution would normally allow, extending the effective dynamic range beyond the limit imposed by the resolution.^[28]

The use of techniques such as oversampling and noise shaping can further extend the dynamic range of sampled audio by moving quantization error out of the frequency band of interest.

Oversampling

Main article: Oversampling

Oversampling is an alternative method to increase the dynamic range of PCM audio without changing the number of bits per sample.^[29] In oversampling, audio samples are acquired at a multiple of the desired sample rate. Because quantization error is assumed to be uniformly distributed with frequency, much of the quantization error is shifted to ultrasonic frequencies, and can be removed by the digital to analog converter during playback.

For an increase equivalent to n additional bits of resolution, a signal must be oversampled by

\mathrm{number\ of\ samples} = (2^n)^2 = 2^{2n}.

For example, a 14-bit ADC can produce 16-bit 48 kHz audio if operated at 16× oversampling, or 768 kHz. Oversampled PCM therefore exchanges fewer bits per sample for more samples in order to obtain the same resolution.

Dynamic range can also be enhanced with oversampling at signal reconstruction, absent oversampling at the source. Consider 16× oversampling at reconstruction. Each sample at reconstruction would be unique in that for each of the original sample points sixteen are inserted, all having been calculated by the digital signal processor (FIR digital filter) as time interpolation. This is not linear interpolation. The mechanism of lowered noise floor is as previously discussed, that is, quantization noise power has not been reduced, but the noise spectrum has been spread over 16× the audio bandwidth.

Historical note—The compact disc standard was developed by a collaboration between Sony and Phillips. The first Sony consumer unit featured a 16-bit DAC; the first Phillips units dual 14-bit DACs. This caused confusion in the marketplace and even in professional circles. Years after, one of the electronic engineering trade journals mistakenly made a historical note of the 14-bit DACs in the Phillips unit as allowing 84 dB SNR, as the writer was either unaware that the specifications of the unit indicated 4× oversampling or unaware of the implication. It was correctly noted that Phillips had no OEM sourced 16-bit DAC at the time, but the writer was not cognizant of the power of digital signal processing to increase the audio SNR to 90 dB.^[30]

Noise shaping

Main article: Noise shaping

Oversampling a signal results in equal quantization noise per unit of bandwidth at all frequencies and a dynamic range that improves with only the square root of the oversampling ratio. Noise shaping is a technique that adds additional noise at higher frequencies which cancels out some error at lower frequencies, resulting in a larger increase in dynamic range when oversampling. For nth-order noise shaping, the dynamic range of an oversampled signal is improved by an additional 6n dB relative to oversampling without noise shaping.^[31] For example, for a 20 kHz analog audio sampled at 4× oversampling with second order noise shaping, the dynamic range is increased by 30 dB. Therefore, a 16-bit signal sampled at 176 kHz would have equal resolution as a 21-bit signal sampled at 44.1 kHz without noise shaping.

Noise shaping is commonly implemented with delta-sigma modulation. Using delta-sigma modulation, Super Audio CD obtains 120 dB SNR at audio frequencies using 1-bit audio with 64× oversampling.

Applications

Bit depth is a fundamental property of digital audio implementations and there are a variety of situations where it is a measurement.

Example applications and bits per sample
Application	Description	Audio format(s)
CD-DA (Red Book)^[32]	Digital media	16-bit LPCM
DVD-Audio^[33]	Digital media	16-, 20- and 24-bit LPCM^{[note 1]}
Super Audio CD^[34]	Digital media	1-bit Direct Stream Digital (PDM)
Blu-ray Disc audio^[35]	Digital media	16-, 20- and 24-bit LPCM and others^{[note 2]}
DV audio^[36]	Digital media	12-bit compressed PCM and 16-bit uncompressed PCM
ITU-T Recommendation G.711^[37]	Compression standard for telephony	8-bit PCM with companding^{[note 3]}
NICAM-1, NICAM-2 and NICAM-3^[38]	Compression standards for broadcasting	10-, 11- and 10-bit PCM respectively, with companding^{[note 4]}
Ardour^[39]	DAW by Paul Davis and The Ardour Community	"All sample data is maintained internally in 32 bit floating point format..."
Pro Tools 11^[40]	DAW by Avid Technology	16- and 24-bit or 32-bit floating point sessions and 64-bit floating point mixing
Logic Pro X^[41]	DAW by Apple Inc.	16- and 24-bit projects
Live 9^[6]	DAW by Ableton	32-bit floating point mixing
Reason 7^[42]	DAW by Propellerhead Software	16-, 20- and 24-bit I/O, 32-bit floating point arithmetic and 64-bit summing
Reaper 5	DAW by Cockos Inc.	8-bit PCM, 16-bit PCM, 24-bit PCM, 32-bit PCM, 32-bit FP, 64-bit FP, 4-bit IMA ADPCM & 2-bit cADPCM rendering; 8-bit int, 16-bit int, 24-bit int, 39-bit int, 32-bit float, and 64-bit float mixing
GarageBand '11 (version 6)^[43]	DAW by Apple Inc.	16-bit default with 24-bit real instrument recording
Audacity^[44]	Open source audio editor	16- and 24-bit LPCM and 32-bit floating point

Bit rate and file size

Bit depth affects bit rate and file size. Bit rate refers to the amount of data, specifically bits, transmitted or received per second.

Notes

↑ DVD-Audio also supports optional Meridian Lossless Packing, a lossless compression scheme.
↑ Blu-ray supports a variety of non-LPCM formats but all conform to some combination of 16, 20 or 24 bits per sample.
↑ ITU-T specifies the A-law and μ-law companding algorithms, compressing down from 13 and 14 bits respectively.
↑ NICAM systems 1, 2 and 3 compress down from 13, 14 and 14 bits respectively.

References

↑ Thompson, Dan (2005). Understanding Audio. Berklee Press. ISBN 978-0-634-00959-4.
↑ Smith, Julius (2007). "Pulse Code Modulation (PCM)". Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition, online book. Retrieved 22 October 2012.
↑ Campbell, Robert (2013). Pro Tools 10 Advanced Music Production Techniques, pg. 247. Cengage Learning. Retrieved 12 August 2013.
↑ Wherry, Mark (March 2012). "Avid Pro Tools 10". Sound On Sound. Retrieved 10 August 2013.
↑ Price, Simon (October 2005). "Reason Mixing Masterclass". Sound On Sound. Retrieved 10 August 2013.
1 2 "Ableton Reference Manual Version 9, 15. Mixing". Ableton. 2013. Retrieved 26 August 2013.
↑ Kabal, Peter (3 January 2011). "Audio File Format Specifications, WAVE Specifications". McGill University. Retrieved 10 August 2013.
↑ Kabal, Peter (3 January 2011). "Audio File Format Specifications, AIFF / AIFF-C Specifications". McGill University. Retrieved 10 August 2013.
↑ Smith, Steven (1997–98). "The Scientist and Engineer's Guide to Digital Signal Processing, Chapter 4 – DSP Software / Floating Point (Real Numbers)". www.dspguide.com. Retrieved 10 August 2013.
↑ See Signal-to-noise ratio#Fixed point
↑ Kester, Walt (2007). "Taking the Mystery out of the Infamous Formula, "SNR = 6.02N + 1.76dB," and Why You Should Care" (PDF). Analog Devices. Retrieved 26 July 2011.
↑ "PCM4222". Retrieved 21 April 2011. Dynamic Range (–60dB input, A-weighted): 124dB typical Dynamic Range (–60dB input, 20kHz Bandwidth): 122dB typical
↑ "WM8741 : High Performance Stereo DAC". Cirrus Logic. Retrieved 2016-12-02. 128dB SNR (‘A’-weighted mono @ 48kHz) 123dB SNR (non-weighted stereo @ 48kHz)
↑ Nwavguy (2011-09-06). "NwAvGuy: Noise & Dynamic Range". NwAvGuy. Retrieved 2016-12-02. 24 bit DACs often only manage approximately 16 bit performance and the very best reach 21 bit (ENOB) performance.
↑ D. R. Campbell. "Aspects of Human Hearing" (PDF). Archived from the original (PDF) on 21 August 2011. Retrieved 21 April 2011. The dynamic range of human hearing is [approximately] 120 dB
↑ "Sensitivity of Human Ear". Archived from the original on 4 June 2011. Retrieved 21 April 2011. The practical dynamic range could be said to be from the threshold of hearing to the threshold of pain [130 dB]
↑ "The great audio myth: why you don't need that 32-bit DAC". Android Authority. Retrieved 2016-12-02. So your 32-bit DAC is actually only ever going to be able to output at most 21-bits of useful data and the other bits will be masked by circuit noise.
↑ "32-bit capable DACs". hydrogenaud.io. Retrieved 2016-12-02. all the '32 bit capable' DAC chips existent today have actual resolution less than 24 bit.
1 2 Smith, Steven (1997–98). "The Scientist and Engineer's Guide to Digital Signal Processing, Chapter 28 – Digital Signal Processors / Fixed versus Floating Point". www.dspguide.com. Retrieved 10 August 2013.
↑ Moorer, James (September 1999). "48-Bit Integer Processing Beats 32-Bit Floating-Point for Professional Audio Applications" (PDF). www.jamminpower.com. Retrieved 12 August 2013.
1 2 3 Tomarakos, John. "THE RELATIONSHIP OF DATA WORD SIZE TO DYNAMIC RANGE AND SIGNAL QUALITY IN DIGITAL AUDIO PROCESSING APPLICATIONS". www.analog.com. Analog Systems. Retrieved 16 August 2013.
↑ "DSP56001A" (PDF). Freescale. Retrieved 15 August 2013.
↑ Smith, Steven (1997–98). "The Scientist and Engineer's Guide to Digital Signal Processing, Chapter 4 - DSP Software / Number Precision". Retrieved 19 August 2013.
↑ Carletta, Joan (2003). "Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters". DAC. CiteSeerX 10.1.1.92.1266.
↑ Choosing a high-performance audio ADC – "I went looking for the best dynamic range audio ADC I could find" and highest are 123 dB dynamic range
↑ Montgomery, Chris (25 March 2012). "24/192 Music Downloads ...and why they make no sense". xiph.org. Retrieved 26 May 2013. With use of shaped dither, which moves quantization noise energy into frequencies where it's harder to hear, the effective dynamic range of 16 bit audio reaches 120dB in practice, more than fifteen times deeper than the 96dB claim. 120dB is greater than the difference between a mosquito somewhere in the same room and a jackhammer a foot away.... or the difference between a deserted 'soundproof' room and a sound loud enough to cause hearing damage in seconds. 16 bits is enough to store all we can hear, and will be enough forever.
↑ Stuart, J. Robert (1997). "Coding High Quality Digital Audio" (PDF). Meridian Audio Ltd. Retrieved 2016-02-25. One of the great discoveries in PCM was that, by adding a small random noise (that we call dither) the truncation effect can disappear. Even more important was the realisation that there is a right sort of random noise to add, and that when the right dither is used, the resolution of the digital system becomes infinite.
↑ "Dithering in Analog-to-Digital Conversion" (PDF). e2v Semiconductors. 2007. Retrieved 26 July 2011.
↑ Kester, Walt. "Oversampling Interpolating DACs" (PDF). Analog Devices. Retrieved 19 August 2013.
↑ http://www.hifiengine.com/manual_library/philips/cd100.shtml
↑ "B.1 First and Second-Order Noise Shaping Loops". Retrieved 19 August 2013.
↑ "Sweetwater Knowledge Base, Masterlink: What is a "Red Book" CD?". www.sweetwater.com. Sweetwater. 27 April 2007. Retrieved 25 August 2013.
↑ "Understanding DVD-Audio" (PDF). Sonic Solutions. Archived from the original (PDF) on 4 March 2012. Retrieved 25 August 2013.
↑ Shapiro, L. (2 July 2001). "Surround Sound, Page 10". ExtremeTech. Retrieved 26 August 2013.
↑ "White paper Blu-ray Disc Format, 2.B Audio Visual Application Format Specifications for BD-ROM Version 2.4" (PDF). Blu-ray Disc Association. April 2010. Retrieved 25 August 2013.
↑ Puhovski, Nenad (April 2000). "DV – A SUCCESS STORY". www.stanford.edu. Archived from the original on 27 October 2004. Retrieved 26 August 2013.
↑ "G.711 : Pulse code modulation (PCM) of voice frequencies" (PDF). International Telecommunications Union. Retrieved 25 August 2013.
↑ "DIGITAL SOUND SIGNALS: tests to compare the performance of five companding systems for high-quality sound signals" (PDF). BBC Research Department. August 1978. Archived from the original (PDF) on 8 November 2012. Retrieved 26 August 2013.
↑ "Ardour Key Features". Ardour Community. 2014. Retrieved 8 April 2014.
↑ "Pro Tools Documentation, Pro Tools Reference Guide" (ZIP/PDF). Avid. 2013. Retrieved 26 August 2013.
↑ "Logic Pro X: User Guide" (PDF). Apple. January 2010. Retrieved 26 August 2013.
↑ "Reason 7 Operation Manual" (PDF). Propellerhead Software. 2013. Retrieved 26 August 2013.
↑ "GarageBand '11: Set the audio resolution". Apple. 13 March 2012. Retrieved 26 August 2013.
↑ "Audacity: Features". wiki.audacityteam.com. Audacity development team. Retrieved 13 September 2014.

Ken C. Pohlmann (15 February 2000). Principles of Digital Audio (4th ed.). McGraw-Hill Professional. ISBN 978-0-07-134819-5.

This article is issued from Wikipedia - version of the 12/2/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.