Vowel Perception and Speaker Normalization

Will Styler


Vowel Perception is Basically Magic

Will Styler



Review: What is a vowel?



Review: What is a vowel?

A vowel is voicing passing through (and resonating in) an unobstructed vocal tract!

If we change the position of the tongue, we change the resonances



What do vowels sound like?




Vowel formants


Formants alone can be enough for some perception!


Sine Wave Speech



Sample Sine Wave Speech

F1:

F2:

F3:

Combined:

Original:


There’s more to vowels than formants


So, we think about vowels in terms of formants


Different American English vowels, as spoken by a male speaker


Vowel formants are reflections of articulations


The IPA chart is acoustic!






So…


Language is crazy


Why is vowel perception hard?


Perceptual Gradience


Perceptual Gradience


Date vs. Debt

Date:

Debt:

?:

??:

???:


Let’s do an experiment!


The first and last sounds have formants like the typical English /eɪ/ and /ɛ/vowels


… but in the middle, we’re not really sure what’s going on


Phonemic Inventory plays a major role in categorization!


Language as a perceptual factor


Spanish


English


Swedish


Speaker Variation!


Speaker Vowel Space Variation



Speaker Vowel Space Variation

Different speakers produce different resonances, even for the “same” vowels




Some people even acknowledge this




But it’s even worse than it seems…


Moment-to-moment Vowel Variation






Every person you’ve ever talked with has had different vowel formant patterns


See, I told you: Magic


How do we accomplish this perceptual magic?


Dealing with vowel variability


Dirty Phonological Tricks


Vowel Inventories are designed for perceptibility


Spanish


English


Swedish


Vowel Inventories are designed for perceptibility

Vowels are spread through the mouth


Vowel Length can help too!

Data from Rositske 1939


Context helps!


The Role of Context


Speaker Normalization


Speaker Normalization

### History of Normalization
* Differences in absolute vowel qualities were noted very early on
* Two Competing Theories in the 40’s and 50’s:
* Peterson: We identify vowels based on their absolute formant frequencies
* Joos: We identify vowels based on their relative formant structures
* If Joos is right, then prior context aids in normalization
* Ladefoged and Broadbent set out to test that idea in “Information conveyed by vowels” in 1957

Information Conveyed by Vowels


1957!


They had to paint what they wanted on glass

Then feed it into an analog sound synthesizer

The results weren’t too pretty

Stimulus #4:

Stimulus #5:

Stimulus #6:


… but it worked!


Different contexts led to different perception!


Ladefoged and Broadbent: Conclusions

“The linguistic information conveyed by a vowel is largely dependent on the relations between the frequencies of its formants and the formants of other vowels occurring in the same auditory context”


So, uh, how’s that work going?


We’ve got two main theories!


Speaker-intrinsic vowel space normalization


Speaker-extrinsic vowel space normalization



We don’t know which is more accurate!


What do we know about normalization?


What else do we know about normalization?


These finches are a major problem.





This suggests that normalization may be a more general cognitive process

“OK, OK, we get it. Nothing’s real. Everybody varies. Speech study is impossible. Let’s change to syntax.”


How do we cope as researchers?


Mathematical Normalization

“Various algorithms have already been proposed for this purpose. The criterion for their degree of success might be that they should maximally reduce the variance within each group of vowels presumed to represent the same target when spoken by different speakers, while maintaining the separation between such groups of vowels presumed to represent different targets.” (Disner 1979)


Lobanov (1971) Normalization


Danger!!


Vowel Normalization is imperfect

### Wrapping up
* Formants (F1 & F2) are the primary means of identifying vowels
* Vowel charts, although well-intentioned, are dirty, dirty abstractions
* Vowel perception is complicated by the enormous variation between speakers and tokens
* Phonology, Context, and Secondary Cues help to make things perceptually easier
* There’s not a strong consensus on how exactly we normalize across speakers
- Or how Zebra finches do
* Vowel perception is basically magic

Thank you!

http://savethevowels.org/talks/vowelperception_advanced.html


References

Baru, A. V. (1975). Discrimination of synthesized vowels /a/ and /i/ with varying parameters (f0, intensity, duration, # of formants) in dog. In G. Fant, & M. A. A. Tatham (Eds.), Auditory Analysis and perception of speech. New York: Academic Press.

Ciocca, V., Wong, N. K. Y., Leung, W. H. Y., & Chu, P. C. Y. (2006). Extrinsic context affects perceptual normalization of lexical tone. The Journal of the Acoustical Society of America, Vol. 119, No. 3, 1712-1726.

Charlton, B. D., Ellis, W. A. H., Brumm, J., Nilsson, K., and Fitch, W. T. (2012). Female koalas prefer bellows in which lower formants indicate larger males. Animal Behaviour, 84(6):1565– 1571.

Disner, S.F. (1980).  Evaluation of vowel normalization procedures. The Journal of the Acoustical Society of America, Vol 67(1), 253-261. 

Joos, M. (1948). Acoustic Phonetics - Supplement to Language. Baltimore: Linguistic Society of America.

Ladefoged, P., & Broadbent, D. E. (1957). Information Conveyed by Vowels. The Journal of the Acoustical Society of America, Volume 29, Number 1, 98-104.

Lobanov, B. (1971). Classification of Russian Vowels Spoken by Different Speakers. The Journal of the Acoustical Society of America, 49(2B):606–608.

Ohms et al. Zebra finches exhibit speaker-independent phonetic perception of human speech. Proceedings of the The Royal Society of Biological Sciences (2009)

Rositzke, H. A. (1939). Vowel-Length in General American Speech. Language, Vol. 15, No. 2, 99-109.

Verbrugge, R. R., Strange, W., Shankweiler, D. P., & Edman, T. R. (1976). What information enables a listener to map a talker's vowel space? Journal of the Acoustical Society of America, Vol. 60, No. 1, 198-212.

Whalen, D. H., & Sheffert, S. M. (1997). Normalization of Vowels by Breath Sounds. In K. Johnson, & J. W. Mullenix (Eds.), Talker Variability in Speech Processing (pp. 133-143). San Diego, CA: Academic Press Ltd.