--- # Vowel Perception is magic
### LING 3100 - Will Styler --- ### So, vowel perception is kind of my thing * My project for this class * My project for the grad version * My MA Thesis * My Prelim * My Dissertation ---
--- ### Why study vowel perception? * Because all languages use vowels, and they work similarly everywhere * Because vowel perception is interestingly complicated * Because it shows us interesting things about sociolinguistic distinctions and the mind * Because it’s a great test case for how the brain deals with variation * ***Because vowels are awesome*** --- ### (Yeah, I said it. Take that, Consonants) --- What kind of vowels are we talking about? ---
---
--- ### Review: What is a vowel? * A vowel is voicing passing through (and resonating in) an unobstructed vocal tract! * If we change the position of the tongue, we change the resonances ---
--- ### Review: What is a vowel? A vowel is voicing passing through (and resonating in) an unobstructed vocal tract! If we change the position of the tongue, we change the resonances * Different resonances *filter* the sound differently and determine the vowel quality * **Different tongue shapes create different resonances, and different vowels!** ---
--- ### What do vowels sound like? * We talk about vowel quality in terms of "formants" * These are bands of the spectrum where the energy is strongest * The frequencies of these formants are our primary cues ---
---
--- ### Vowel formants * F1 and F2 are generally considered to be the most important * F3 is good for rounding and rhoticity --- ### Formants alone can be enough for some perception! --- # F1
--- # F2
--- # F3
--- ## All together now! --- ### What's being said?
--- ### Here's the original
--- ### Listen again!
---
--- ### So, vowels are basically formant patterns ---
Different American English vowels, as spoken by a male speaker
--- ### ... and vowel formants map to articulation! ---
--- ### The IPA chart is acoustic! ---
---
---
--- So... * We listen for formants * We figure out their frequencies * Then we know which vowel we’re hearing. * ### What’s the problem? --- # Language is crazy * (In Linguistics, that's *always* the problem) --- ### Why is vowel perception hard? * Vowel differences are gradient * Dialect and language variation is everywhere * Speakers vary from person to person. *A lot!* * They also vary from moment-to-moment --- ## Perceptual Gradience --- ### Perceptual Gradience * You can make an infinite number of tongue shapes causing an infinite number of vowels. * There's no "alveolar ridge" to give a steady target * Phonemes have perceptual boundaries * Vowels with formant values in between phonemes are rough * Not all listeners agree on where the boundaries are between, say, /e/ and /ɛ/ --- ### Date vs. Debt --- # Date
--- # Debt
--- ## ?
--- ## ??
--- ## ???
--- ### Let's do an experiment! --- ## ????
--- The first and last sounds have formants like the typical English /eɪ/ and /ɛ/vowels
--- ... but in the middle, we're not really sure what's going on
--- ## Language plays a major role in categorization! --- ### Language as a perceptual factor * The vowel inventory in a language has a strong effect on the perception of vowels * If you have lots of vowels, each one gets less acoustic elbow room --- ### Spanish
--- ### English
--- ### Swedish
--- ## Speaker Variation! --- ### Speaker Vowel Space Variation * Different speakers produce different resonances, even for the “same” vowels * Vocal tracts can be shorter, longer, wider... ---
--- ### Speaker Vowel Space Variation Different speakers produce different resonances, even for the “same” vowels * Speaker can have colds or allergies, can have more nasal voices... * Sociolinguistic factors galore * Every person has a different set of basic vowel formant positions * This is called the speaker’s “vowel space” ---
---
--- ### Moment-to-moment Vowel Variation * Even the same speaker will have variation from moment to moment * Sometimes we misarticulate, accidentally making the wrong vowel quality * Or we talk with food in our mouths, producing different resonances * Or sometimes, we’re just plain lazy * This leads to constant and massive changes in vowel production ---
---
---
---
--- ### Every person you've ever talked with has had different vowel formant patterns * ... and yet, we understand each other, somehow --- ## See, I told you: Magic
--- ### How do we accomplish this magic? --- ### Some people try to put the issue aside ---
---
--- ### ... but how do we manage perceptually? --- ### Dealing with vowel variability! * We stack the deck in our favor using the phonology of the language * We use non-formant-related cues such as vowel length * We attend to context * We adjust to individual speakers (or vocal tracts) through Speaker Normalization * Then, if all else fails, we pretend that we understood, and hope for the best --- ## Dirty Phonological Tricks --- ### Vowel Inventories are designed for perceptibility * Vowels are spread through the mouth --- ### Spanish
--- ### English
--- ### Swedish
--- ### Vowel Inventories are designed for perceptibility Vowels are spread through the mouth * Languages try to maintain perceptual contrast (to keep things as perceptually unambiguous as possible) * /i, e, a, o, u/ more common than /i, y, e, œ, ɛ/ * Contrasts that are tough to hear go away! * Rounding is used to distinguish vowels which might otherwise be confusable * /i, u/ not /y, u/ --- ### Vowel Length helps too! * English tense vowels (/i, e, o, æ, ɔ, ɑ/) are longer than lax vowels (/ɪ, ʊ, ʌ, ɛ/)
Data from Rositske 1939
--- ## Context helps! --- ### The Role of Context * Context helps us to understand words even if the phonemes are acoustically ambiguous * Easier to understand “Hello” in its normal conversational context * If you’re not expecting a word, you’ll have to fight harder to understand it. * “Hi, John! Partial Nephrectomy!” * “Ohh, Invasive Adenocarcinoma arising in tubulovillious adenoma” * Nobody runs into rooms and shouts "bat!" --- ## Speaker Normalization --- ### Speaker Normalization * Every speaker you meet has acoustically different vowels * We are able to adjust very quickly, and have little trouble with later understanding * The process by which we adjust is called “Speaker Normalization” * This process isn’t entirely understood * That's a *massive* understatement --- ### History of Normalization * Differences in absolute vowel qualities were noted very early on * Two Competing Theories in the 40’s and 50’s: * Peterson: We identify vowels based on their absolute formant frequencies * Joos: We identify vowels based on their relative formant structures * If Joos is right, then prior context aids in normalization * Ladefoged and Broadbent set out to test that idea in “Information conveyed by vowels” in 1957 --- ### *Information Conveyed by Vowels* * Ladefoged and Broadbent 1957 * Six versions of an introductory sentence were synthesized, each with different formant structures * Four test words were synthesized as well * Listeners heard different combinations of test words and sentences * *If vowel perception is about absolute frequencies, the prior sentence shouldn't matter!* ---
---
--- # 1957! --- They had to paint what they wanted on glass
--- Then feed it into an analog sound synthesizer
--- ### The results weren't too pretty --- ### Stimulus #4
--- ### Stimulus #5
--- ### Stimulus #6
--- ... but it worked! --- ### Different contexts led to different perception!
--- ### Ladefoged and Broadbent: Conclusions > “The linguistic information conveyed by a vowel is largely dependent on the relations between the frequencies of its formants and the formants of other vowels occurring in the same auditory context” * This set the stage for future work in normalization! --- ## So, uh, how's that work going? --- We've got two main theories! --- ### Speaker-intrinsic vowel space normalization * Normalization is a process that “happens” * You meet somebody, you create a model of their vowel space, and you move on * These models of speaker vowels are maintained in memory * One model per person, and a new model each time! --- ### Speaker-extrinsic vowel space normalization * We store information from *every vowel we hear*! * Normalization is then just bulk comparison and probability * Vowel identities are probabilistically determined * One might start with an “English” vowels model * Then, you build a per-speaker exemplar cloud * Both your per-speaker and overall models change ---
--- ### We don't know which is more accurate! --- ### What do we know about normalization? * It’s not just about the point vowels (/i, a, u/) as Joos suggested (Verbrugge et. al. 1976) * Context influences Normalization (as in Ladefoged and Broadbent) * Knowledge about the speaker (gender, sociolinguistic data) influences normalization (Strand 2000) * Recent context might be more important than older context (Ciocca, Wong, et al. 2006) * The normalization process shows up in reaction time during vowel identification tasks (Haggard and Summerfield 1977) --- ### What else do we know about normalization? * Breath sounds don’t provide good information for normalization, and F0 isn’t a critical factor (Walen & Sheffert 1997) * More context seems helpful, but only to a certain point (Kakehi 1992) * We have to normalize to consonants too *Some evidence that vowel formants are used to normalize /s/ vs. /ʃ/ * Infants Can normalize to vowels (Kuhl 1979) * So can dogs (Baru 1975) and Zebra Finches (Ohms et al 2009) --- These finches are a *major* problem. ---
---
---
--- ### Wrapping up * Formants (F1 & F2) are (still) the primary means of identifying vowels * Vowel perception is complicated by the enormous variation between speakers and tokens * Our vowel judgements are affected by the language we speak and by context * The phonology helps make things perceptually easier * Vowel charts, although well-intentioned, are dirty, dirty abstractions * Vowel perception is basically magic ---
---
Thank you!
http://savethevowels.org/talks/vowelperception.html --- # References Baru, A. V. (1975). Discrimination of synthesized vowels /a/ and /i/ with varying parameters (f0, intensity, duration, # of formants) in dog. In G. Fant, & M. A. A. Tatham (Eds.), Auditory Analysis and perception of speech. New York: Academic Press. Ciocca, V., Wong, N. K. Y., Leung, W. H. Y., & Chu, P. C. Y. (2006). Extrinsic context affects perceptual normalization of lexical tone. The Journal of the Acoustical Society of America, Vol. 119, No. 3, 1712-1726. Joos, M. (1948). Acoustic Phonetics - Supplement to Language. Baltimore: Linguistic Society of America. Ladefoged, P., & Broadbent, D. E. (1957). Information Conveyed by Vowels. The Journal of the Acoustical Society of America, Volume 29, Number 1, 98-104. Ohms et al. Zebra finches exhibit speaker-independent phonetic perception of human speech. Proceedings of the The Royal Society of Biological Sciences (2009) Rositzke, H. A. (1939). Vowel-Length in General American Speech. Language, Vol. 15, No. 2, 99-109. Verbrugge, R. R., Strange, W., Shankweiler, D. P., & Edman, T. R. (1976). What information enables a listener to map a talker's vowel space? Journal of the Acoustical Society of America, Vol. 60, No. 1, 198-212. Whalen, D. H., & Sheffert, S. M. (1997). Normalization of Vowels by Breath Sounds. In K. Johnson, & J. W. Mullenix (Eds.), Talker Variability in Speech Processing (pp. 133-143). San Diego, CA: Academic Press Ltd. ---