An Introduction to Computer Audio

Will Styler


How do acousticians say hello?


Today’s Plan


Sound is compression and rarefaction in a medium


Timeshifted sound is a novelty


Analog Recording


The Phonograph


Playback from Phonographs


These recordings are ephemeral and bad

‘The Lost Chord’ by Arthur Sullivan (1888)


There’s an inherent tradeoff



Electric Recording fixes this!


Microphones


Dynamic Microphones


Condenser Microphones


Now you have sound as a voltage on an electrical line


Speakers


There are many types of speakers, some are different!


Any Questions so far?


So, that’s how we capture sound


But then everything changed


Computer Audio


Computers don’t do waves

010001110010101000100101101010101010


Sound is analog, computers are digital


Quantization


Quantization


Quantization


Quantization


Quantization


Analog-to-digital conversion


How often do we sample?


Sampling Rate


Sampling Rate (low rate)


Sampling Rate (low rate)


Sampling Rate (lower rate)


Sampling Rate (lower rate)


Sampling Rate


Bad sampling makes for bad waves

### Good sampling rates capture the necessary set of frequencies

Good sampling rates capture the necessary set of frequencies


Higher frequencies need higher sampling rates


Higher frequencies need higher sampling rates


Nyquist Theorem

The highest frequency captured by a sample signal is one half the sampling rate


Sampling Rates (Shpongle - ‘Nothing is something worth doing’)

44,100 Hz

22,050 Hz

11,025 Hz

6000 Hz


Sampling Rates (Shpongle - ‘Nothing is something worth doing’)

44,100 Hz

6000 Hz

3000 Hz

1500 Hz

800 Hz


Different media use different sampling rates


Your sampling rates should be at least 44,100


Clipping





Clipping introduces noise into FFTs


Clipping introduces noise into FFTs


Clipping is also dangerous for audio equipment


Adjust your levels while recording!


… but what are we storing for amplitude at each point, anyways?


Bit Depth


Your bit depth will likely be 16 bit


So, we sample, at a reasonable sampling rate and bit depth


This all means that ‘vinyl captures more detail’ people are provably wrong


This is what your ‘sound card’ or ‘USB capture box’ does


Capturing the samples into a file gives you uncompressed sound files!


You should save your data files as WAV when possible


… but what if you need your files to take up less space


Audio Codecs


Codecs encode and decode signals


Codecs aren’t quite the same as audio formats


There are many ways to store and stream audio


Lossless Compression


You should save your data files as WAV when possible!


Lossy File formats


Lossless vs. Lossy Compression


Lossy codecs are everywhere


Lossy Compression throws away information strategically


It’s a lot like image compression!









Here’s what it looks like when you make it lossless again



You can choose how much to compress the sounds!


Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

320kbps mp3

192kbps mp3

128kbps mp3


Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

64kbps mp3

48kbps mp3

32kbps mp3

8kbps mp3


Original from https://www.youtube.com/watch?v=wBnevSbdb7g


Lossy compression of audio throws away data!


Lossy compression makes decisions!


An aside: FILE compression is lossless


‘Noise Reduction’


The World is Noisy


‘Noise Reduction’ Algorithms


Get a local recording alongside videoconferencing


Key takeaways


Friends don’t let friends use lossy codecs in science


So, how to we put sound into ML models?


Well, much like the rest of us!


There are more problems


Putting in the waveform itself was historically a poor choice


Why not linguistically useful features?


Linguistically useful features benefits


Linguistically useful features downsides


For research, linguistically useful features are great


We don’t need transparent or minimal


Let’s get that algorithm a Matrix


Mel-Frequency Cepstral Coefficients (MFCCs)


We’re not going deep here


MFCCs


MFCC Process


MFCC Input


MFCC Output


So, the sound becomes a matrix of features



Now let’s try computer audio on our own!