LIGN 168 - Linear Predictive Coding

# Linear Predictive Coding

### Will Styler - LIGN 168

---

### Today's Plan

- Source/Filter Review

- Understanding LPC

- Where is LPC useful?

- What are the dangers of LPC?

---

### Review: Source Filter Theory

- **Source**: The harmonics output by the larynx

---

- **Filter**: The resonance properties of the rest of the vocal tract
	- These can be poles (adding power) or zeroes (removing power)

- The speech signal can be thought as the result of imposing the filter on the source

---

### Thus, we need to understand two states to model speech!

- What is the source doing? 
	- (i.e. What is f0 and that whole situation?)

- What is the filter doing? 
	- (i.e. what are the poles and zeroes like)

- **These two things are independent of one another**

---

### Understanding f0 and the Source

- What is f0?

- What are the harmonics' (relative) amplitudes?

- Is the source signal noisy?  Irregular?

- This is a question of pitch tracking and source modeling

- *We'll talk about this next time!*

---

### Understanding the filter

- Where are the main poles which are filtering the source?

- Where are the main zeroes which are filtering the source?

- How are these poles and zeroes changing over time?

- We use LPC for this!

---

### The filter has a large effect on a signal's *spectral envelope*

---

### The filter has a large effect on a signal's *spectral envelope*

---

### If we can estimate that envelope, without the effect of the source, we have the filter!

- To do that, we need...

---

## Linear Predictive Coding (LPC)

---

### LPC is a tool for analyzing the spectral properties of a signal

- It aims to estimate the filter **only**, and track changes to that filter over the duration of a signal

- This was developed specifically for speech encoding, although it's useful elsewhere

- It's *linear* in that it uses linear equations to model the speech signal

- It's *predictive* in that it looks at past moments to *predict* the present state

- It can be used to *(en)code* and *(de)code* speech signals

---

### There's a lot of math underlying, but we're going to focus on intuitions

- Luckily, the math is very easy to find out there on the internet

- ... and boy do electrical engineers love making this opaque and equationy

---

### LPC has a few main steps to 'encode' a sound

- Step 1: Division into frames

- Step 2: Auto-Correlation Computation

- Step 3: Coefficient Calculation

---

### Step 1: Division into frames ('framing')

- We're going to take the sound and slice it into a series of overlapping frames

- Usually this window is ~20-30ms
	- This is much longer than the windows used for Fourier Analysis
	- The same **time-frequency tradeoff** applies here

- We'll use a windowing function to smooth transitions between windows (e.g. Hamming Windowing)

- This gives us a series of *frames* that we'll evaluate step by step

- We can assume/hope/pray that the vocal tract state is relatively steady in each 30 ms bucket

---

### Note: Framing will be a regular step all quarter

- It is very common to do framing as a part of nearly any speech processing pipeline

- The window function doesn't tend to vary too much

- Timescales sometimes do, but it's usually around 20-30ms

---

### A nice visualization of frames

---

### Step 2: Autocorrelation

- Autocorrelated things are predictable on the basis of their immediate past

- Speech is heavily *autocorrelated*

- Prior chunks of the signal look a lot like the subsequent chunks

- Tongues don't tend to teleport

---

### An /i/ vowel

---

### Finding autocorrelation is a brute-force process

- Take the correlation between the frame and the exact same frame at the same time
	- This will be a perfect correlation

- Now, take the correlation between the frame and the frame delayed by a certain **lag**
	- This will be a much lower number

- Now, continue trying longer lags and watch the autocorrelation change
	- Remember that a given frame will have more than one cycle (period) in it
	- When the cycles align, autocorrelation will spike!
	- We get a function from this

---

---

### an /i/ vowel

---

### An /n/

---

### This outputs an 'Autocorrelation Function'

- "Over the range of possible lags, here's how the autocorrelation changes"

- This will have multiple spikes (with the biggest at one period)

- *The timing and degree of these spikes actually tells us about the overall spectral shape*

---

---

### Step 3: Coefficient Generation

- **Black Box Alert!** This step requires math which we are not discussing 
	- Google 'Levinson-Durbin Algorithm for solving the Yule-Walker equations'

- The goal here is to create a set of *coefficients* which describe the filter's *spectral envelope*
	- Together, these coefficients describe the shape of the filter (poles, zeroes, and all)

- This is an optimization and modeling process!

---

### Poles and Zeroes of the Spectral Envelope

---

### The Key Insight: Generate filters, treat the source as error, and minimize the error!

- Sure, there's source information in the autocorrelation function, but a good filter will minimize its importance when *predicting* the signal

- We're modeling the stuff that changes less often (e.g. the tongue, formants, etc) and just letting the source do its own little thing

- We solve these equations to find *the filter that minimizes the contributions of the source!*

- The 'LPC model' is a set of 10-20 coefficients describing the filter in detail

---