--- ### Acknowledgements * My Advisor, Rebecca * My Committee * Luciana Marques, Georgia Zellou, and Story Kiser * The rest of the CU Linguistic Community * The Illocutionary Force --- ### More Acknowledgement * My family * Jessica * :) * Vowels --- # On the Acoustical and Perceptual Features of Vowel Nasality ### Will Styler ---
--- ### Vowel Nasality Opening the Velopharyngeal Port during vowel production to allow nasal airflow --- ### Vowel Nasality plays an important role in many languages! --- ### Coarticulatory Nasality in English
‘Pats’
[pæts]
‘Pants’
[pæ̃nts]
--- ### Contrastive Nasality in Lakota
‘seed’
[su]
‘braid’
[sũ]
* (Nasality is also contrastive in French, Hindi, Bengali, and lots more!) --- Listeners clearly can make judgements about nasality in individual vowels* *
(c.f. Lahiri and Marslen-Wilson 1991, Beddor and Krakow 1999, Beddor 2013, Kingston and Macmillin 1995, Macmillin et al 1999, the existence of French, Hindi, Lakota...)
* ... **but Linguists don't understand what *features* of the signal allow them to do so!** --- ### That's where I come in!
--- ### Two Goals * 1) Figure out what acoustical features are associated with nasality in English and French * 2) Figure out which ones humans are actually *using* to hear nasality. --- ## The Overall Plan * Collect Data and measure possible features * **Experiment 1** - What features are statistically linked to nasality? * **Experiment 2** - What features are *useful* for identifying nasal vowels? * (These two experiments combine to tell us which features look most promising) * **Experiment 3** - What features are humans using to perceive nasality? * **Experiment 4** - Does machine learning show a similar perceptual pattern? --- # Data Collection! --- ### Data Collection * I recorded 12 English and 8 French speakers making words with oral and nasal(ized) vowels * For English, I recorded CVC/CVN/NVC/NVN minimal pairs * For French, I recorded nasal/oral vowel minimal pairs * 4,778 vowels total * Find features that *could* indicate nasality, and measure them! * All measurement was done automatically by Praat Script * Toss the measurements into R for analysis --- ### Feature Selection
---
--- ### Let's talk about a few features more specifically --- ### Vowel Formant Frequency/Bandwidth
--- ### Vowel Formant Frequency/Bandwidth
--- ### A1-P0
--- ### P0 Prominence
--- ### Vowel Duration
--- ### Spectral Tilt
--- # Experiment 1: Statistical Analysis! --- ### The Idea * *"If a feature doesn't meaningfully differ between oral and nasal vowels, humans won't use it."* * **Let's test which features are different in oral and nasal vowels!** --- ### Experiment 1: Plan * 1) Run a bunch of Linear Mixed-Effects Regressions for the features for English and French * This will show the *statistical* link between the features and nasality * Control for the effect of repetition, timepoint, speaker, and word * 2) See which features showed significant changes between oral and nasal(ized) vowels * 3) Compare the magnitude of the oral-to-nasal change for each feature * Larger changes are probably more useful --- ### The Findings * Only 19/29 features showed a significant link with nasality in both languages * ... but not the same 19! * Of those, only some showed large oral-to-nasal changes --- ### The *Most Promising* Features * **Formant Bandwidth** was really strong in both languages * **Formant Frequency** showed weaker (but still meaningful) differences * **A1-P0** performed well in both languages * **P0Prominence** worked well in both languages * **Duration** showed major changes in both languages * (English nasalized vowels appear shorter, French nasal vowels appear longer) * **Spectral Tilt** showed strong changes in French, less so in English --- ### Experiment 1 Wrap-up * We now know which features are linked with nasality *across the entire dataset* * ... and which ones show the largest oral-to-nasal changes * A1-P0, Duration, Spectral Tilt, Formant Bandwidth/Frequency, and P0Prominence --- These tests show *overall trends* across several thousand words * **But speech perception involves classifying *each individual vowel!*** --- How do we know if these features help us spot nasality *in any given vowel*? --- ## Ask a Computer!
--- # Experiment 2: Machine Learning! --- ### The Idea Speech perception is just classifying sounds based on acoustical features * **Computers can do that too!** * Give the feature information to a classifier and ask for oral vs. nasal judgements * Greater accuracy means a feature or grouping is more useful! --- ### Basic Machine Classification * "Find the patterns in this training data, then use them to predict which group this new datapoint belongs to!" * "Based on the words around it, what verb sense is being used?" * "Is this handwritten symbol a "1"? "2"? "3"? * **"Does this set of measurements indicate an oral vowel, or a nasal vowel?"** --- ### Machines have some advantages over humans! * They live in my apartment! * They don't have *any* context. * Their decisions are easier to quantify. * They'll tell you *how* they made the decision they did. --- ### Experiment 2: Plan * 1) Give features to Machine Learning algorithms one at a time * The features which give the best accuracy should be the most useful * 2) Give them *all the features at once*, then ask the algorithms which features are most useful. * 3) Find the best group of features * Find the balance between "few features" and "good accuracy" * Test *those* features with expensive humans (Experiment 3!) --- ### My Algorithms of Choice * RandomForests * Make a bunch of decision trees, and use the best one! * Support Vector Machines * Find the mathematical separation that optimally groups classes! * RandomForests are really transparent, SVMs are really accurate. * All analyses will use both! --- ## Single-feature tests --- ### Single-Feature testing * Are any features good enough *on their own* to allow nasal perception? * 116 models, one per feature per algorithm per language * Each model outputs accuracy, which we can compare! --- ### Single-feature findings! * Duration is suspiciously useful * 79.7% accuracy with RF, only 59.2% with SVMs in English * F1's Bandwidth wins for English * 67.6% SVM accuracy * Spectral Tilt wins for French * 76.8% SVM accuracy * A1-P0 gets second place for both * 64.7% in English SVMs, 75.7% in French. * *None of the features are good enough on their own!* --- ### Which features are most useful *in a combined model*? --- ## Evaluating Feature Importance --- ### RandomForest Importance RandomForests can calculate *which features were most useful* for classification!
1.
2.
3.
English
F1's Bandwidth
A1-P0
Duration
French
Spectral Tilt
A1-P0
F1's Bandwidth
--- So, we know which features are useful and important * **What's the best group to test?** --- ## Multi-feature Models --- ### Multi-feature modeling * Tested 10 *a priori* feature groupings * There are 20,030,007 other possible groupings of 10 features out of 29. * Compare accuracy *in light of the number of features* * The winning model gets the best performance from the fewest features --- ### Multi-feature Results * SVMs with all features worked best (29 features) * 84.7% accuracy for English, 93.7% in French * Formant Width, Formant Frequency, Tilt, A1-P0, and Duration was the best subgroup (9 features) * 82.2% for English, 91.7% for French * **We only lose 2-3% accuracy when we reduce our feature set by 68%!** * That's a promising grouping! --- ### Overall Machine Learning Results * **Formant Bandwidth** was the best feature for English, strong in French * **Spectral Tilt** was the most useful feature in French, less so in English * **A1-P0** performed well in both languages * **P0Prominence** was not useful for classification * **Formant Frequency** was useful too! * **Duration** was *really* useful in both languages * ... but this could be because it lends itself particularly well to classification --- So, we've got 5 features which allow high accuracy * ## Let's see if humans use them! --- # Experiment 3: Human Perception --- ### The Idea * English listeners can use vowel nasality to identify missing nasal consonants! * ba_ could be "bad" or "ban" * **Let's add or remove features from vowels to see what indicates "nasality"!** * If adding or removing a feature changes perception, or makes them react more slowly, it's important! * Manipulate features independently, or together, in both /ɑ/ and /æ/ --- ### The Plan * 1) Create nasal vowels where each nasal feature is *reduced*. * Listeners might think they're oral! * 2) Create oral vowels where each nasal feature is *added*. * Listeners might think they're nasal! * 3) Create control stimuli which are modified then unmodified * This will reveal any problems with the stimuli * 4) Give them to listeners, then analyze Accuracy and Reaction Time! --- ### The Modifications * Simulate the oral-to-nasal change in A1-P0 (or vice versa) * Lower A1-P0 by -5.3 dB in oral vowels, raise by 5.3 dB in nasal ones * Simulate the oral-to-nasal change in duration (or vice versa) * Simulate the oral-to-nasal change in spectral tilt (or vice versa) * Change the formant structure * Change F1 and F3 bandwidth to match the oral and nasal norms * Simulate the *overall* oral-to-nasal change in F1's frequency at the same time * Modify *all four features at once!* ("Allmod") --- ### The Experiment * Data from 42 normal-hearing Native English speakers from the LING Subject Pool ---
bad
ban
---
bomb
bob
---
bad
mad
--- (397 more times!) --- ### The Analysis * Use the accuracy and reaction time data from this experiment. * If listeners call originally nasal vowels "oral" (or vice versa), we'll call the response **inaccurate**. * Reduced accuracy means we've affected the perception of nasality! * **Increased RT** means we've made classification more difficult. * Check the data using Linear Mixed-Effects Regressions --- ## Feature Addition (oral-made-nasal) Findings ---
* *Modifying formants (or all together) resulted in more confusion!* * People called oral vowels "nasal" more often with modified formants * The pattern of the All-Modified stimuli was statistically similar. ---
* *Modifying formants (or all together) resulted in slower reaction times!* * People were slower to call vowels "oral" or "nasal" with modified formants --- ### Addition Summary * Perception was affected by modifying formant structure, or by modifying all features. * Post-hoc tests show that "All" and "Formant" modification were not significantly different * **Only modifying formant frequency and bandwidth had an effect on perception!** --- ## Feature Reduction (Nasal-made-Oral) Findings ---
* Confusion wasn't affected by modificaton! * We never changed "nasal" to "oral" by modifying features ---
* *Modifying formants (or all features) resulted in slower reaction times!* * People were slower to call vowels "oral" or "nasal" with modified formants --- ### Removal Summary * *None of the experimental modifications* affected confusion * Nothing I did made a nasal vowel "oral" * Modifying formants (or all features) resulted in slower responses * Post-hoc tests show that "All" and "Formant" modification did not meaningfully differ * **Formant changes slowed listeners down, but didn't change classification!** --- ### Experiment 3 Summary * Only **formant modification** had a significant effect on perception * Formant modification caused listeners to respond more slowly * Formant modification made oral vowels sound "nasal" * F1's bandwidth is probably the cue * It worked best in ML, had the best statistical link, and it makes sense acoustically * Hawkins and Stevens (1985) also points that direction * Formant modification **wasn't enough** to make nasal vowels sound "oral" --- (We'll talk more about that asymmetry later!) --- So, we can answer our primary research question! * ### Formant structure is the main cue to nasality in English! ---
--- So, the machine learning models predicted F1's bandwidth as the most useful feature... * ### How similar *are* the SVMs and the humans? --- # Experiment 4: Humans vs. Machines *
--- ### The Idea * *Let's give the computer the same experimental task as the humans, using the same altered stimuli, and see how they compare!* --- ### The Plan * 1) Train SVMs on different datasets * NoNVN - Trained on English without NVNs (like the stimuli) * EnAll - Trained on *all* the English data * EnFrAll - Trained on English *and* French * 2) Test those SVMs on the experimental stimuli (classifying "oral" or "nasal") * 3) Compare the by-condition results to the humans --- ### Experimental Stimuli by Condition
---
--- ### Experiment 4 Summary * Humans and machines *did* show similar patterns * Modifications that were difficult for humans were difficult for SVMs * The Generic English model showed the most similarity * Adding in French training data was a **bad** idea * Perceptual testing with machine learning isn't crazy * Humans still win. --- ### Hooray!
--- ## Coming full circle **Experiment 1** - What features are statistically linked to nasality? **Experiment 2** - What features are *useful* for identifying nasal vowels? **Experiment 3** - What features are humans using to perceive nasality? **Experiment 4** - Do computers show a similar perceptual pattern? --- ## Discussion --- ### We've got some great new information about nasality * We know more about measuring nasality * There's no "magic feature", but A1-P0 isn't bad * We should also try F1's Bandwidth * We know which features *just don't work*. * We know more about cross-linguistic differences in nasal acoustics --- ### Machine Learning is a good tool in phonetic research * We can accurately classify nasality using acoustics alone * The best features are general, rather than nasality specific * SVM classification showed similarity to human perception! * Modeling humans using machines isn't crazy! --- ### Formants are the main cue to nasality perception in English * Modifying formants was the *only* modification which affected perception * ... but it's probably not the *only* cue for vowel nasality --- ### Reducing Formant Bandwidth doesn't make nasal vowels "oral" * Listeners slow down, but they don't reclassify when we change bandwidth * There was still something "nasal" about the vowels * This actually makes sense, because... --- ### Nasal vowels are produced with different *oral* articulations * The oral differences between oral and nasal vowels are *not* arbitrary * c.f. (Carignan et al. (2015), Carignan (2014), Carignan et al. (2011) and Shosted et al. (2012)) * We only made formant changes assocated with *all vowels* * Vowel-specific changes in formants were ignored * If nasal vowels are *orally* different, we wouldn't confuse listeners by removing "*nasality*" * At worst, they hear a "nasal vowel" without nasality! --- ### Independent Nasal Vowels make sense! * Contrast enhancement using a secondary feature is common * Duration and Vowel quality, Nasality and Pharyngealization (Zellou 2012), and more * Nasal vowel systems are often very different than the oral vowel systems * Centralization and quality shifts are well known * Nasal systems often change independently of oral systems diachronically * So, nasality is *part of* the difference, but it's not the only difference! --- ### This isn't the final word on nasality perception * The English results used college-aged speakers and listeners * The process may look different for pathological hearing or speech. * We only tested two vowels here, and we've got plenty more. * French Perception experiments need to be done! * There are still lots of languages out there in the world. --- ### Conclusions * Our current measurements of nasality aren't bad * Although F1's Bandwidth is a great new one. * Machines *can* accurately classify nasality * ... and simulate human perception! * Formant bandwidth is the best nasality cue we've got * ... at least for English * ... but other aspects of the vowel articulation are important too! --- Most importantly... --- ### There's more to vowel nasality than nasal airflow! --- ## Thank you! --- # Questions? --- ### References Carignan, C. (2014). An acoustic and articulatory examination of the oral in nasal: The oral articulations of french nasal vowels are not arbitrary. Journal of Phonetics, 46(0):23–33. Carignan, C., Shosted, R., Shih, C., and Rong, P. (2011). Compensatory articulation in american english nasalized vowels. Journal of Phonetics, 39(4):668 – 682. Carignan, C., Shosted, R. K., Fu, M., Liang, Z.-P., and Sutton, B. P. (2015). A real-time mri investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of french. Journal of Phonetics, 50(0):34 – 51. --- ### References Continued Chen, M. Y. (1997). Acoustic correlates of english and french nasalized vowels. The Journal of the Acoustical Society of America, 102(4):2350–2370. Hawkins, S. and Stevens, K. N. (1985b). Acoustic and perceptual correlates of the non-nasal–nasal distinction for vowels. The Journal of the Acoustical Society of America, 77(4):1560–1575. Shosted, R., Carignan, C., and Rong, P. (2012). Managing the distinctiveness of phonemic nasal vowels: Articulatory evidence from hindi. The Journal of the Acoustical Society of America, 131(1):455–465. G. Zellou. Similarity and Enhancement: Nasality from Moroccan Arabic Pharyngeals and Nasals. PhD thesis, University of Colorado at Boulder, 2012. --- ### Other Tables ---
---
---
---
--- ### Control vs. Experimental Stimuli
--- ### Control Stimuli by Condition
---