# Speech Perception ### Pomona Phonetics (LING 104) - Will Styler
--- ## Today's Plan - What is speech perception? - Speech perception's two biggest problems - Some unexpected phenomena - Unanswered questions in speech perception --- # What is speech? --- ### Speech is absolutely insane - It's a series of fluid and overlapping gestures - It's amazingly complex --- ### ... but we can pretend and think about speech as a sequence of vowels and consonants - Individual speech sounds - A series of articulatory targets... - Not the same ones as your writing system - Which is broadcasted to the world acoustically - Producing the articulations produces characteristic* patterns of sound --- ### So, what do these vibrations look like? ---
--- ### Different speech gestures create different sounds - Each element of a speech articulation has acoustical consequences - Combinations of different frequencies at different amplitudes over time - Today, we'll focus on a few details rather than general speech acoustics ---
--- ### How do humans pick them up? ---
---
--- ### "Oh, he's talking about that thing up there!"
--- # What is Speech Perception? --- ### Well, what *is* speech perception? Thoughts? --- ## Speech Perception - The process of turning acoustic speech signals back into linguistic content - At its core, it can be seen as a decoding or inversion problem - "I have a signal which reflects a series of gestures. How do I determine those gestures?" --- ### What do we want from a theory of speech perception? - We want it to explain how we go from the sound we hear to phones or words - We want it to handle the dynamic nature of sounds - We want it to be flexible for different talkers and languages - We want it to explain our accuracy at this *super* hard task --- ### Let's design a perfect signaling system - We're sending messages consisting of sequences of 1, 2, 3, 4, 5 --- ### Lots of people think this is how speech perception works - ... but this ignores the two biggest problems we have in speech perception --- ## The two biggest problems in Speech Perception --- ### Speech perception is made more complicated by two *massive* problems - Segmentation - Variation --- ### Segmentation - Sounds are not "beads on a string" - Coarticulation creates overlap across gestures - Even discrete sounds can have unclear acoustic boundaries --- ### Pandas /pændəz/
--- ### Sparrows /spɛɹowz/
--- ### Owls /awlz/
--- ### Cues are often found on adjacent sounds
--- ### Aside: This is why speech recognition has historically used diphones
--- ### Segmentation is a problem - ... although it can be made less of a problem with some theories! - More later --- ### Invariance and Inversion --- ### Speaker Vowel Space Variation * Different speakers produce different resonances, even for the “same” vowels * Vocal tracts can be shorter, longer, wider... --- ### Here's the weird part! - Different speakers have different formants, even for the “same” vowels! * Every person has a different set of basic vowel formant positions * This is called the speaker’s “vowel space” --- ### Speaker Average Formants
--- ### Moment-to-moment Vowel Variation * Even the same speaker will have variation from moment to moment * We often move our tongues differently, changing the vowel's quality * For many, many reasons * This leads to constant and massive changes in vowel production --- ### Speaker Average Formants
--- ### Individual Token Formants
--- ### Individual Token Formants
---
--- ### It's not only vowels! - Differences in the realization of tones - Differences in the acoustics of nasality (c.f. [Styler 2017](https://wstyler.ucsd.edu/files/styler2017_jasa_onacousticalnatureofnasality.pdf)) - Differences in fricative acoustics --- ### Speaker Normalization - A process which allows us to adjust to these differences - A very much unsolved problem --- ### We'll talk more about this in my talk this afternoon --- ### So, the biggest issues are segmentation and variation - ... but even if those were solved, speech perception is very weird --- ## Stupid Speech Perception Tricks --- ## Trick #1: Gradient Vowels, Categorical Perception * When we're familiar with the categories in a language, that affects our perception strongly --- ### Date vs. Debt --- # Date
--- # Debt
--- ## ?
--- ## ??
--- ## ???
--- ### Let's do an experiment! --- ## ????
--- ### Gradient Perception * We use our knowledge of the categories to make strong decisions about which sounds are which * ... but they're not always the same decisions as your neighbors! --- ### Trick #2: Coda Recovery ---
bad
ban
---
bomb
bob
---
bob
bomb
---
duck
dunk
---
bob
bomb
---
--- **We pay attention to tiny details!** ---
bend
/bɛnd/
* **...but there's more to it than the symbols show us!** --- ### Coarticulation When we start preparing for the next sound *before it even begins* * In the word "bend", we start nasal airflow before the nasal /n/, *during the vowel* ---
bend
/bɛnd/
/bɛ̃nd/
--- ### You use coarticulation to hear missing sounds!
---
--- ### How we hear nasality was the topic of [my doctoral dissertation](http://wstyler.ucsd.edu/files/styler_dissertation_final.pdf) - Yes, that's a clickable link to a PDF --- Speaking of recovering huge amounts of information... --- ### Trick #3: Multi-modal perception --- ### The McGurk Effect (Part 1)
--- ### The McGurk Effect (Part 2)
--- ### They're the same video! -
😇
--- ### Spoken Language is multi-modal! - The distinction between visual and auditory modalities isn't as cut and dry as many think --- ### This gets even weirder --- ### Gick and Derrick 2009: "Aero-tactile integration in speech perception"
Nature. 2009 November 26; 462(7272): 502–504. doi:10.1038/nature08572.
--- > Syllables heard simultaneously with cutaneous air puffs were more likely to be heard as aspirated (for example, causing participants to mishear ‘b’ as ‘p’). These results demonstrate that perceivers integrate event-relevant tactile information in auditory perception in much the same way as they do visual information. (from Gick and Derrick 2009: "Aero-tactile integration in speech perception") --- ### So, it's complicated - Continuous input feels categorical - Coarticulation is useful for perception - We don't just use acoustics to understand speech - ... but it works! --- # The Big Open Questions --- ### We have more questions than answers in speech perception - But here are a few big ones --- ### How does speaker normalization work? - Is this a discrete process that 'runs' before the rest of perception? - Is it a general process, or speaker-specific? - Why can dogs, finches, chinchillas, new babies, and computers all do this, if speech is so uniquely human? - Does it even have to happen at all? - Or are we saved by a different approach to speech? --- ### How are we perceiving speech? - Are we listening directly to the acoustics? - "General Auditory" theory - Are we trying to figure out 'what they did' in terms of gestures using our own speech system? - Motor Theory - Are we building a mental model of the mechanism that must have produced that sound? - 'Direct Realism' or Direct Perception --- ### Gestural theories have some benefits - Coarticulation is a part of the model, and a feature rather than a bug - Speaker Normalization is about physics, not human variation - Storage gets real weird, though, and it's a bit odd for creatures without tongues - Speaking of creatures without tongues... --- ### Can computers do speech as well as humans? - More on this this afternoon! --- ### What are we storing in our mind? - Abstractions (e.g. /nɔɪz/) - 'Averages' from many productions - Individual productions of words - As sound? As gestures? --- ### Is speech special? - Do we process speech in the same way that we process every other acoustic stimulus? - Do we process speech using a special language-specific or tongue/motor localized processes? --- ### Whether or not speech perception is special, it's amazing - Despite major issues with segmentation and variation - It's wildly complicated and not just limited to sound - There remain many fundamental unanswered questions - **Yet it still works so well that we don't even think twice about it!** --- ### In Summary... --- # Speech perception is Magic - ... and that makes you all wizards ---
---
Thank you!
Questions?