# Resynthesis ### Will Styler - LIGN 168 --- ### First, one quick concept to learn --- ### The Weak form of the Navier-Stokes Equations - Useful for describing fluid movement (as in the vocal tract) -
--- ### Lol jk - This is an apology - ... and room for questions from last time --- ### Now back to the actual course material --- ### Now we've got the Source and Filter - Pitch-finding techniques give us the frequency and characteristics of the source - LPC gives us the filter which is applied to the source - Both are matrices of numbers, which are easy to modify - **What can we do with that information?** --- ### Resynthesis - Turning speech into a computational numerical representation, and then back to speech again - We have Speech-to-Text, Text-to-Speech - This is Speech-to-Speech! --- ### Today's Plan - Modifying Pitch and Speed - Modifying Pitch and/or Speed - Source-Filter Resynthesis - The Dumbest and Most Exotic Voice Modification --- ## Modifying Pitch and Speed --- ### It's very easy to modify pitch *and* speed - Play the samples back at a different rate! - Decrease the speed of playing samples to slow the sound down - "Treat a 44,100Hz recording like it was 22,050" - Playing back 44,100 samples will take two seconds, instead of one - Increase the speed of playing samples to speed sound up - "Treat a 44,100Hz recording like it was 88,200" - Playing back 44,100 samples will take 0.5 seconds, instead of one - *This is different from resampling, as we're not adding or removing any points!* --- ### Increasing Playback Speed Base (00:08)
0.9x (00:07)
0.75x (00:06)
0.5x (00:04)
0.25x (00:02)
0.1x (00:00:81)
--- ### Decreasing Playback Speed Base (00:08)
1.1x (00:09)
1.5x (00:12)
2x (00:16)
3x (00:24)
4x (00:32)
--- ### When you speed up playback, pitches go up - The period changes when you play cycles back faster, so we perceive a higher pitch - This affects *all* frequencies, not just voiced portions - This is useful for making ents and chipmunks - Also [the best non-cover cover of 'Jolene'](https://www.youtube.com/watch?v=CMrfM711vXI) - *When most people play with pitch or duration, they'd rather be...* --- ## Modifying Pitch *OR* Speed --- ### Can we change duration independent of frequency, and vice versa? - This is a much harder to do! - Luckily, there's an algorithm for that! --- ### Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) - An algorithm which permits you to modify the duration or pitch of a voice, *independently of one another* - First described by [Moulines and Charpentier in 1990](https://www.sciencedirect.com/science/article/abs/pii/016763939090021Z) - There is a whole family of related algorithms for doing this, each with nuances, but we're talking generally about the concept, and simplifying some - This can be used to modify pitch, or duration! - **Again, we're focusing on intuitions over math, and doing some black-boxing** --- ### The PSOLA Process for Changing Pitch - Find the f0 of the signal and identify periods - Segment the sound into 'grains' with a *filter*, so that the edges smooth to zero - Overlap the grains more tightly to increase pitch, or more loosely to reduce pitch - Duplicate or remove grains to keep constant duration - Collapse the overlapped grains down into one signal --- ### Finding Periods
--- ### Filtering, Overlapping
--- ### Duplicating Cycles
--- ### Adding the grains together
--- ### Thus, we can change f0, without changing the duration - Overlap the grains more tightly to increase pitch - Overlap the grains more loosely to decrease pitch - Add or remove grains to keep duration constant - Then add it all up to the final sound! --- ### Increasing f0 using PSOLA 1x
1.2x
2x
3x
4x
5x
--- ### Decreasing f0 using PSOLA 1x
0.9x
0.75x
0.65x
--- ### We can see the process in the resulting waveforms (Unmodified)
--- ### We can see the process in the resulting waveforms (f0 x5)
--- ### We can modify *just* duration, too! - Find the f0 of the signal and identify periods - Segment the sound into 'grains' with a *filter*, so that the edges smooth to zero - Keep the grain-spacing the same (thus preserving f0) - Duplicate or remove grains to increase the duration - Collapse the overlapped grains down into one signal --- ### Decreasing Duration using PSOLA Base (00:08)
0.75x (00:06)
0.5x (00:04)
0.25x (00:02)
0.1x (00:00:81)
--- ### Increasing Duration using PSOLA Base (00:08)
1.1x (00:09)
1.5x (00:12)
2x (00:16)
3x (00:24)
4x (00:32)
15x (2:00)
--- ### This preserves interpretability! 4x PSOLA (00:32)
4x Playback (00:32)
0.5x PSOLA (00:04)
0.5x Playback (00:04)
--- ### You can manipulate Duration and Pitch at once "Make this sentence 1.5x slower, while raising the pitch 1.5x" Base Pitch and Duration
1.5x Pitch and Duration
--- ### There is a *lot* of complexity to do PSOLA right - You need a good, reliable pitch track - Creak or aperiodicity make this hard - All the problems from last time apply - PSOLA for non-periodic sounds is tough - You can grab repeated-feeling grains of a fricative and do your best - You need to filter precisely so that they add smoothly - You need to make sure the added segments are *in the correct phase* --- ### There is a *lot* of complexity to do PSOLA right (continued) - You need to make sure that *individual harmonics still align* - You need to choose which grains to add or delete - You need even fancier approaches where there's more than one f0 (e.g. two voices or instruments) --- ### PSOLA is not perfect - There will be artifacting, increasing with the amount of change - It will miss cycles and leave some bits sounding 'wrong' - There are limits as to the amount of (good-sounding) possible change - **But on the whole, it works pretty damned well!** - ... and it's used all the time under a different name --- ### Autotune - An algorithm developed by Andy Hildebrand and marketed in 1997 - Currently sold by Antares Audio Technology - Uses an approach similar to PSOLA to resynthesize voices - Automatically 'corrects' pitch to hit a certain target - Can be used to 'fix' [imprecise vocals](https://en.wikipedia.org/wiki/Paris_(Paris_Hilton_album)) - Can also be used to artistic ends (e.g. [T-Pain](https://www.youtube.com/watch?v=Lt2wjJlP2N4) or Cher's [Believe](https://www.youtube.com/watch?v=nZXRV4MezEw)) - Can also be used to [Autotune the News](https://www.youtube.com/watch?v=3eooXNd0heM) --- ### Cher's 'Believe' - Released in 1998 (one year after Autotune) - Autotune was briefly known as 'the Cher Effect' -
--- ### Cool! We can change the source! - ... but what about the filter? --- ## Source-Filter Resynthesis --- ### PSOLA modifies the signal as a whole - "We're going to change f0 or duration, but try not to muck up the rest of the spectral information" - Harmonic matching and careful blending mean it's usually pretty successful - *But PSOLA doesn't make any effort to disentangle the source and filter!* - *... and it sure can't change the filter independently of the source* --- ### Can we resynthesize in a way that preserves that independence? - **Yes, with 'Source Filter Resynthesis'** - Also called 'LPC Resynthesis' in the literature - It is very simple, conceptually - Model the Source and Filter, Modify either, and then Recombine them - It uses only concepts you already understand! --- ### Modeling the Source and Filter - Numerically model the source - What's the pitch like? - Where is there voicing? - **This is voicing and pitch detection!** - Numerically model the filter - What's the rest of the spectral envelope like? - How does it change over time? - **This is LPC analysis!** --- ### Now you have models for the source and filter - You can modify the source independently - "Let's increase f0 by 20%!" - Then you can synthesize a source - Voiceless bits can be modeled as white noise - You can also modify the filter independently - "Let's move the first formant up by 100Hz" - This gives you a new filter - Then you apply the filter directly to the source - This is basic math! - **Now you have a sound signal reflecting the modification!** --- ### LPC Resynthesis can be used to modify the pitch! **DO NOT USE LPC RESYNTHESIS TO MODIFY PITCH** Unmodified
1x f0 LPC
1.5x f0 PSOLA
1.5x f0 LPC
0.65x f0 PSOLA
0.65x f0 LPC
--- ### LPC Resynthesis can change the nature of the filter! - This is how phoneticians modify formants when creating stimuli! - Separate the source from the filter via Inverse Filtering - Modify the LPC, and then recombine! - You can change formant frequency, or bandwidth! - My Praat Handbook explains the process in 8.15 --- ### Here's a nice continuum Date
Debt
11 Steps
--- ### Date-Debt Continuum
--- ### This is amazing! - It's the very best way to change the quality of vowels and otherwise - It allows you to test hypotheses about filter shapes - It gives you incredible freedom to modify voices - There's just one problem... --- ### Source Filter Resynthesis Sucks - When you have a problem, and you decide to use LPC resynthesis, you have two problems - It's common not to be able to 'make the change' - If you don't do the inverse-filter properly, some of the filter sticks around - Noise and artifacts are everywhere - You need to resample to do it properly - You can modify part of the sound (e.g. the bottom 3500 Hz) and recombine with the rest - You need to double check every stimulus to make sure it's doing what you think it is - **Come talk to me if you ever start down this dark road** --- ### LPC Resynthesis shows every error in the chain! - All the error in pitch estimation - All the error in LPC estimation - Any error in the math or changes to the relevant formants - This requires the errors to be modeled to sound clean! - We can't do this for modifying the sounds - LPC Resynthesis for voice compression models the error explicitly --- ### LPC Resynthesis struggles to sound 'normal' and 'human' - You'll often get 'robotic' voices who don't sound like the person they were - Bad modeling can create clicks and whistles - It sounds a lot like a bad cell phone call from the early 2000s - We'll talk more about why soon! - ... but it's great at making somebody sound non-human! - There it's called... --- ### Musical Vocoding - You don't have to use a human source with the extracted filter - You can use any sound you'd like as the 'source' - Including a synthesizer! --- ### Daft Punk
--- ### Daft Punk led the way in Vocoder Use Harder, Better, Faster, Stronger (from 'Discovery')
Doin' it Right (from 'Random Access Memories')
--- ### Vocoders are common and awesome - They're different from 'talk boxes' - Those use a tube to direct an instrument's output into a person's physical mouth - They're often based on LPC synthesis! - ... and they're the place where LPC synthesis is most fun! --- ## The Dumbest and Most Exotic Voice Modification --- ### Jocelyn Pook's 'Masked Ball' - Written for a pivotal scene in Stanley Kubrick's 'Eyes Wide Shut' (1999) - This movie is VERY Not-Safe-For-Work, and very good -
-
- Actually just a Romanian Orthodox Priest singing, but reversed - Zisa Domnului catre ucenicii sai...Porunca noua dau voua - And God told to his apprentices...I gave you a command. --- ### Reversal! - You can get neat effects by just reversing audio - Lots of musicians play with this - Jocelyn Pook is great at this, see also 'Yellow Fever Psalm' - Also called 'Backmasking' when you hide reversed messages - [Mick Gordon, the Composer of DOOM (2016) is a gigantic troll](https://www.youtube.com/watch?v=U4FNBMZsqrY) - The relevant part starts around 40 minutes in, but the whole thing is worth watching --- ### Wrapping up - Modifying pitch *and* duration is really easy - PSOLA is really valuable for modifying pitch *and/or* duration - PSOLA is really complicated - Source-Filter Resynthesis allows you to modify just the filter of a word - Source-Filter Resynthesis sucks for everybody but Daft Punk - !stceffe taen yllaer sevig hceeps gnisreveR --- ### Next time - How can we modify speech signals to clean them up? ---
Thank you!