A way to decode lossy files to >16 bits per sample [Delivered in build 333]

allan · October 13, 2017, 10:55pm

@Nick_Stamp: With lossy audio compression formats, there isn’t really an inherent bit depth associated with the content of a compressed AAC or MP3 file. You can think of these files as a mathematical description of what the compression algorithm considered important information (overly simplified: “Here’s a bunch of sine waves that sound a lot like the original source when you stack them on top of each other”). Most decoders will calculate a decoded waveform in a high-precision intermediate format and then will quantize to 16 bits and apply dither if asked for 16-bit samples as the final output. Regardless of the sample size of the original source, undithered 24 or 32 bit output of a decoder will be closer to the waveform described in the compressed data than the quantized and dithered 16-bit output. If the original source was 16 bits per sample (ie: ripped from CD), it will have been quantized and dithered twice* (once during mastering, a second time during decoding) before it reaches your ears. With the advent of initiatives like Mastered for iTunes, it’s now more common to be able to buy compressed music that was generated from undithered 24-bit sources. If your decoder can generate 24+ bit output, you can also play it back without it ever having been bounced down to 16 bits per sample.

*or even three times, if it happens to also go through a software mixer set to output 16 bits per sample.

Well, last time I had to write code that decoded MP3s (15+ years ago… yeesh), I remember using libmad to decode mp3s to 24-bit PCM. I just played around with the “madplay” command line tool (available in the ubuntu apt repo) and it appears to produce unpadded 24-bit output when asked:

madplay -b 24 input.mp3 -o madout.wav
MPEG Audio Decoder 0.15.2 (beta) - Copyright (C) 2000-2004 Robert Leslie et al.
          Title: On Hold
         Artist: The xx
      Orchestra: The xx
          Album: I See You
          Track: 8
           Year: 2017
8582 frames decoded (0:03:44.1), +0.2 dB peak amplitude, 68 clipped samples

flac madout.wav

flac 1.3.1, Copyright (C) 2000-2009  Josh Coalson, 2011-2014  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

madout.wav: WARNING: legacy WAVE file has format type 1 but bits-per-sample=24
madout.wav: wrote 43215207 bytes, ratio=0.728

flac -ac madout.flac |grep wasted
<... snip ...>
    subframe=1      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=13   residual_type=RICE      partition_order=0
    subframe=0      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=13   residual_type=RICE      partition_order=1
    subframe=1      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=13   residual_type=RICE      partition_order=0
    subframe=0      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=13   residual_type=RICE      partition_order=1
<... etc etc etc ...>

Kind of a roundabout way to detect zero-padding by using flac, but I don’t know of any other command-line tools that do it. Similar playing around with ffmpeg (and looking at the mpeg audio decoder source) seems to suggest that its MP3 decoder is fixed at 16 bits per sample, regardless of requested output format. The ffmpeg AAC decoder, however, decodes internally to “fltp” 32-bit floating point and will happily output to true 24/32-bit wav:

/usr/bin/ffmpeg -i input.m4a -acodec pcm_s24le aacout.wav
ffmpeg version 2.8.11-0ubuntu0.16.04.1 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
  configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
  libavutil      54. 31.100 / 54. 31.100
  libavcodec     56. 60.100 / 56. 60.100
  libavformat    56. 40.101 / 56. 40.101
  libavdevice    56.  4.100 / 56.  4.100
  libavfilter     5. 40.101 /  5. 40.101
  libavresample   2.  1.  0 /  2.  1.  0
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  2.101 /  1.  2.101
  libpostproc    53.  3.100 / 53.  3.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10e3400] stream 0, timescale not set
[mjpeg @ 0x10e6a80] Changeing bps to 8
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.m4a':
  Metadata:
  <... snip ...>
  Duration: 00:05:19.14, start: 0.000000, bitrate: 278 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 263 kb/s (default)
    Metadata:
      creation_time   : 1999-05-13 10:58:17
    Stream #0:1: Video: mjpeg, yuvj444p(pc, bt470bg/unknown/unknown), 572x600 [SAR 300:300 DAR 143:150], 90k tbr, 90k tbn, 90k tbc
Output #0, wav, to 'aacout.wav':
  Metadata:
    <... snip ...>
    ISFT            : Lavf56.40.101
    Stream #0:0(eng): Audio: pcm_s24le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s32, 2116 kb/s (default)
    Metadata:
      creation_time   : 1999-05-13 10:58:17
      encoder         : Lavc56.60.100 pcm_s24le
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s24le (native))
Press [q] to stop, [?] for help
size=   82452kB time=00:05:19.13 bitrate=2116.5kbits/s
video:0kB audio:82452kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000294%

flac aacout.wav

flac 1.3.1, Copyright (C) 2000-2009  Josh Coalson, 2011-2014  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

aacout.wav: WARNING: skipping unknown chunk 'LIST' (use --keep-foreign-metadata to keep)
aacout.wav: wrote 63672261 bytes, ratio=0.754

flac -ac aacout.flac |grep wasted

<...>
        subframe=0      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=11   residual_type=RICE      partition_order=3
        subframe=1      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=12   residual_type=RICE      partition_order=3
        subframe=0      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=11   residual_type=RICE      partition_order=3
        subframe=1      wasted_bits=0   type=LPC        order=8 qlp_coeff_precision=15  quantization_level=11   residual_type=RICE      partition_order=3
        subframe=0      wasted_bits=0   type=FIXED      order=2 residual_type=RICE      partition_order=0
        subframe=1      wasted_bits=0   type=LPC        order=7 qlp_coeff_precision=15  quantization_level=11   residual_type=RICE      partition_order=0
<...>