Dissertation Defense - Will Styler

---

### Acknowledgements

* My Advisor, Rebecca

* My Committee

* Luciana Marques, Georgia Zellou, and Story Kiser

* The rest of the CU Linguistic Community
	
	* The Illocutionary Force
	
---

### More Acknowledgement
	
* My family

* Jessica

* :)

* Vowels

---
# On the Acoustical and Perceptual Features of Vowel Nasality

### Will Styler
---

<img class="r-stretch big" src="phonmedia/sagittal.png" alt="$txt">
---

### Vowel Nasality

Opening the Velopharyngeal Port during vowel production to allow nasal airflow

---

### Vowel Nasality plays an important role in many languages!

---

### Coarticulatory Nasality in English

<center>
<table>
  <tr>
    <th>‘Pats’<br>[pæts]</th>
    <th>‘Pants’<br>[pæ̃nts]</th>

</tr>
</table>
</center>

---

### Contrastive Nasality in Lakota

<center>
<table>
  <tr>
    <th>‘seed’	<br>[su]</th>
    <th>‘braid’<br>[sũ]</th>

</tr>
</table>
</center>

* (Nasality is also contrastive in French, Hindi, Bengali, and lots more!)

---

Listeners clearly can make judgements about nasality in individual vowels*

* <small>(c.f. Lahiri and Marslen-Wilson 1991, Beddor and Krakow 1999, Beddor 2013, Kingston and Macmillin 1995, Macmillin et al 1999, the existence of French, Hindi, Lakota...)</small>

* ... **but Linguists don't understand what *features* of the signal allow them to do so!**

---

### That's where I come in!

---

### Two Goals

* 1) Figure out what acoustical features are associated with nasality in English and French

* 2) Figure out which ones humans are actually *using* to hear nasality.

---

## The Overall Plan

* Collect Data and measure possible features

* **Experiment 1** - What features are statistically linked to nasality?

* **Experiment 2** - What features are *useful* for identifying nasal vowels?

* (These two experiments combine to tell us which features look most promising)

* **Experiment 3** - What features are humans using to perceive nasality?

* **Experiment 4** - Does machine learning show a similar perceptual pattern?

---

# Data Collection!

---

### Data Collection

* I recorded 12 English and 8 French speakers making words with oral and nasal(ized) vowels

* For English, I recorded CVC/CVN/NVC/NVN minimal pairs

* For French, I recorded nasal/oral vowel minimal pairs
	
	* 4,778 vowels total
	
*  Find features that *could* indicate nasality, and measure them!
	
	* All measurement was done automatically by Praat Script
	
* Toss the measurements into R for analysis

---

### Feature Selection

---

---

### Let's talk about a few features more specifically

---

### Vowel Formant Frequency/Bandwidth

---
### Vowel Formant Frequency/Bandwidth

<img class="r-stretch" src="phonmedia/ispectrum.png" alt="$txt"> 
---

### A1-P0

---

### P0 Prominence

---

### Vowel Duration

---

### Spectral Tilt

---

# Experiment 1: Statistical Analysis!

---

### The Idea

* *"If a feature doesn't meaningfully differ between oral and nasal vowels, humans won't use it."*

* **Let's test which features are different in oral and nasal vowels!**

---

### Experiment 1: Plan

* 1) Run a bunch of Linear Mixed-Effects Regressions for the features for English and French

* This will show the *statistical* link between the features and nasality
	
	* Control for the effect of repetition, timepoint, speaker, and word

* 2) See which features showed significant changes between oral and nasal(ized) vowels

* 3) Compare the magnitude of the oral-to-nasal change for each feature

* Larger changes are probably more useful

---

### The Findings

* Only 19/29 features showed a significant link with nasality in both languages

* ... but not the same 19!

* Of those, only some showed large oral-to-nasal changes

---

### The *Most Promising* Features

* **Formant Bandwidth** was really strong in both languages

* **Formant Frequency** showed weaker (but still meaningful) differences

* **A1-P0** performed well in both languages

* **P0Prominence** worked well in both languages

* **Duration** showed major changes in both languages

* (English nasalized vowels appear shorter, French nasal vowels appear longer)
	
* **Spectral Tilt** showed strong changes in French, less so in English

---

### Experiment 1 Wrap-up

* We now know which features are linked with nasality *across the entire dataset*

* ... and which ones show the largest oral-to-nasal changes

* A1-P0, Duration, Spectral Tilt, Formant Bandwidth/Frequency, and P0Prominence

---

These tests show *overall trends* across several thousand words

* **But speech perception involves classifying *each individual vowel!***

---

How do we know if these features help us spot nasality *in any given vowel*?

---

## Ask a Computer!

<img class="r-stretch big" src="img/hal9000.jpg" alt="$txt"> 			
---

# Experiment 2: Machine Learning!

---

### The Idea

Speech perception is just classifying sounds based on acoustical features

* **Computers can do that too!**

* Give the feature information to a classifier and ask for oral vs. nasal judgements

* Greater accuracy means a feature or grouping is more useful!
	
---

### Basic Machine Classification

* "Find the patterns in this training data, then use them to predict which group this new datapoint belongs to!"

* "Based on the words around it, what verb sense is being used?"

* "Is this handwritten symbol a "1"? "2"? "3"?

* **"Does this set of measurements indicate an oral vowel, or a nasal vowel?"**

---

### Machines have some advantages over humans!

* They live in my apartment!

* They don't have *any* context.

* Their decisions are easier to quantify.

* They'll tell you *how* they made the decision they did.

---

### Experiment 2: Plan

* 1) Give features to Machine Learning algorithms one at a time

* The features which give the best accuracy should be the most useful
	
* 2) Give them *all the features at once*, then ask the algorithms which features are most useful.

* 3) Find the best group of features

* Find the balance between "few features" and "good accuracy"
	
* Test *those* features with expensive humans (Experiment 3!)

---

### My Algorithms of Choice

* RandomForests

* Make a bunch of decision trees, and use the best one!

* Support Vector Machines

* Find the mathematical separation that optimally groups classes!
	
* RandomForests are really transparent, SVMs are really accurate.

* All analyses will use both!

---

## Single-feature tests

---

### Single-Feature testing

* Are any features good enough *on their own* to allow nasal perception?

* 116 models, one per feature per algorithm per language

* Each model outputs accuracy, which we can compare!

---

### Single-feature findings!

* Duration is suspiciously useful

* 79.7% accuracy with RF, only 59.2% with SVMs in English

* F1's Bandwidth wins for English

* 67.6% SVM accuracy

* Spectral Tilt wins for French

* 76.8% SVM accuracy
	
* A1-P0 gets second place for both

* 64.7% in English SVMs, 75.7% in French.

* *None of the features are good enough on their own!*

---

### Which features are most useful *in a combined model*?

---

## Evaluating Feature Importance

---

### RandomForest Importance

RandomForests can calculate *which features were most useful* for classification!

<center>
<table>
  <tr>
      <th><b><br>1.<br>2.<br>3.</b></th>
	  
    <th><b>English</b><br>F1's Bandwidth<br>A1-P0<br>Duration</th>
    <th><b>French</b><br>Spectral Tilt<br>A1-P0<br>F1's Bandwidth</th>

</tr>
</table>

</center>  
---

So, we know which features are useful and important

* **What's the best group to test?**

---

## Multi-feature Models

---

### Multi-feature modeling

* Tested 10 *a priori* feature groupings
	* There are 20,030,007 other possible groupings of 10 features out of 29.

* Compare accuracy *in light of the number of features*
	* The winning model gets the best performance from the fewest features

---

### Multi-feature Results

* SVMs with all features worked best (29 features)

* 84.7% accuracy for English, 93.7% in French
	
* Formant Width, Formant Frequency, Tilt, A1-P0, and Duration was the best subgroup (9 features)

* 82.2% for English, 91.7% for French
	
* **We only lose 2-3% accuracy when we reduce our feature set by 68%!**

* That's a promising grouping!

---

### Overall Machine Learning Results

* **Formant Bandwidth** was the best feature for English, strong in French

* **Spectral Tilt** was the most useful feature in French, less so in English

* **A1-P0** performed well in both languages

* **P0Prominence** was not useful for classification

* **Formant Frequency** was useful too!

* **Duration** was *really* useful in both languages

* ... but this could be because it lends itself particularly well to classification

---

So, we've got 5 features which allow high accuracy

* ## Let's see if humans use them!

---

# Experiment 3: Human Perception

---

### The Idea

* English listeners can use vowel nasality to identify missing nasal consonants!

* ba_ could be "bad" or "ban"

* **Let's add or remove features from vowels to see what indicates "nasality"!**

* If adding or removing a feature changes perception, or makes them react more slowly, it's important!

* Manipulate features independently, or together, in both /ɑ/ and /æ/

---

### The Plan

* 1) Create nasal vowels where each nasal feature is *reduced*.

* Listeners might think they're oral!

* 2) Create oral vowels where each nasal feature is *added*.

* Listeners might think they're nasal!
	
* 3) Create control stimuli which are modified then unmodified

* This will reveal any problems with the stimuli

* 4) Give them to listeners, then analyze Accuracy and Reaction Time!

---

### The Modifications

* Simulate the oral-to-nasal change in A1-P0 (or vice versa)

* Lower A1-P0 by -5.3 dB in oral vowels, raise by 5.3 dB in nasal ones

* Simulate the oral-to-nasal change in duration (or vice versa)

* Simulate the oral-to-nasal change in spectral tilt (or vice versa)

* Change the formant structure

* Change F1 and F3 bandwidth to match the oral and nasal norms
	
	* Simulate the *overall* oral-to-nasal change in F1's frequency at the same time

* Modify *all four features at once!* ("Allmod")

---

### The Experiment

* Data from 42 normal-hearing Native English speakers from the LING Subject Pool

---

</tr>
</table>
</center>

---

</tr>
</table>
</center>

---
<center>
<table>
  <tr>
    <th><h1>bad</h1></th>
    <th><h1>mad</h1></th>

</tr>
</table>
</center>

---

(397 more times!)

---

### The Analysis

* Use the accuracy and reaction time data from this experiment.

* If listeners call originally nasal vowels "oral" (or vice versa), we'll call the response **inaccurate**.

* Reduced accuracy means we've affected the perception of nasality!

* **Increased RT** means we've made classification more difficult.

* Check the data using Linear Mixed-Effects Regressions

---

## Feature Addition (oral-made-nasal) Findings

---

* *Modifying formants (or all together) resulted in more confusion!*

* People called oral vowels "nasal" more often with modified formants
	
	* The pattern of the All-Modified stimuli was statistically similar.

---
<img class="r-stretch" src="img/diss_rt.add.sum.png" alt="$txt">

* *Modifying formants (or all together) resulted in slower reaction times!*

* People were slower to call vowels "oral" or "nasal" with modified formants

---

### Addition Summary

* Perception was affected by modifying formant structure, or by modifying all features.

* Post-hoc tests show that "All" and "Formant" modification were not significantly different

* **Only modifying formant frequency and bandwidth had an effect on perception!**

---

## Feature Reduction (Nasal-made-Oral) Findings

---

* Confusion wasn't affected by modificaton!

* We never changed "nasal" to "oral" by modifying features
	
---
<img class="r-stretch" src="img/diss_rt.rem.sum.png" alt="$txt">

* *Modifying formants (or all features) resulted in slower reaction times!*

* People were slower to call vowels "oral" or "nasal" with modified formants

---

### Removal Summary

* *None of the experimental modifications* affected confusion

* Nothing I did made a nasal vowel "oral"

* Modifying formants (or all features) resulted in slower responses

* Post-hoc tests show that "All" and "Formant" modification did not meaningfully differ
	
* **Formant changes slowed listeners down, but didn't change classification!**

---

### Experiment 3 Summary

* Only **formant modification** had a significant effect on perception

* Formant modification caused listeners to respond more slowly

* Formant modification made oral vowels sound "nasal"

* F1's bandwidth is probably the cue

* It worked best in ML, had the best statistical link, and it makes sense acoustically
	
	* Hawkins and Stevens (1985) also points that direction

* Formant modification **wasn't enough** to make nasal vowels sound "oral"

---

(We'll talk more about that asymmetry later!)

---

So, we can answer our primary research question!

* ### Formant structure is the main cue to nasality in English!

---

---

So, the machine learning models predicted F1's bandwidth as the most useful feature...

* ### How similar *are* the SVMs and the humans?

---

# Experiment 4: Humans vs. Machines

* <img class="r-stretch big" src="img/terminator.png" alt="$txt">

---

### The Idea

* *Let's give the computer the same experimental task as the humans, using the same altered stimuli, and see how they compare!*

---

### The Plan

* 1) Train SVMs on different datasets

* NoNVN - Trained on English without NVNs (like the stimuli)

* EnAll - Trained on *all* the English data

* EnFrAll - Trained on English *and* French

* 2) Test those SVMs on the experimental stimuli (classifying "oral" or "nasal")

* 3) Compare the by-condition results to the humans

---

### Experimental Stimuli by Condition

---

---

### Experiment 4 Summary

* Humans and machines *did* show similar patterns

* Modifications that were difficult for humans were difficult for SVMs

* The Generic English model showed the most similarity

* Adding in French training data was a **bad** idea

* Perceptual testing with machine learning isn't crazy

* Humans still win.

---

### Hooray!

---

## Coming full circle

**Experiment 1** - What features are statistically linked to nasality?

**Experiment 2** - What features are *useful* for identifying nasal vowels?

**Experiment 3** - What features are humans using to perceive nasality?

**Experiment 4** - Do computers show a similar perceptual pattern?

---

## Discussion

---

### We've got some great new information about nasality

* We know more about measuring nasality

* There's no "magic feature", but A1-P0 isn't bad
	
	* We should also try F1's Bandwidth

* We know which features *just don't work*.

* We know more about cross-linguistic differences in nasal acoustics

---

### Machine Learning is a good tool in phonetic research

* We can accurately classify nasality using acoustics alone

* The best features are general, rather than nasality specific

* SVM classification showed similarity to human perception!

* Modeling humans using machines isn't crazy!

---

### Formants are the main cue to nasality perception in English

* Modifying formants was the *only* modification which affected perception
	
* ... but it's probably not the *only* cue for vowel nasality

---

### Reducing Formant Bandwidth doesn't make nasal vowels "oral"

* Listeners slow down, but they don't reclassify when we change bandwidth

* There was still something "nasal" about the vowels
	
* This actually makes sense, because...
	
---

### Nasal vowels are produced with  different *oral* articulations

* The oral differences between oral and nasal vowels are *not* arbitrary

* c.f. (Carignan et al. (2015), Carignan (2014), Carignan et al. (2011) and Shosted et al. (2012))

* We only made formant changes assocated with *all vowels*

* Vowel-specific changes in formants were ignored

* If nasal vowels are *orally* different, we wouldn't confuse listeners by removing "*nasality*"

* At worst, they hear a "nasal vowel" without nasality!

---

### Independent Nasal Vowels make sense!

* Contrast enhancement using a secondary feature is common

* Duration and Vowel quality, Nasality and Pharyngealization (Zellou 2012), and more
	
* Nasal vowel systems are often very different than the oral vowel systems

* Centralization and quality shifts are well known
	
* Nasal systems often change independently of oral systems diachronically

* So, nasality is *part of* the difference, but it's not the only difference!

---

### This isn't the final word on nasality perception

* The English results used college-aged speakers and listeners

* The process may look different for pathological hearing or speech.
	
* We only tested two vowels here, and we've got plenty more.

* French Perception experiments need to be done!
	
* There are still lots of languages out there in the world.

---

### Conclusions

* Our current measurements of nasality aren't bad

* Although F1's Bandwidth is a great new one.

* Machines *can* accurately classify nasality

* ... and simulate human perception!

* Formant bandwidth is the best nasality cue we've got

* ... at least for English
	
* ... but other aspects of the vowel articulation are important too!

---

Most importantly...

---

### There's more to vowel nasality than nasal airflow!

---

## Thank you!
---

# Questions?

---

### References

Carignan, C. (2014). An acoustic and articulatory examination of the oral in nasal: The oral articulations of french nasal vowels are not arbitrary. Journal of Phonetics, 46(0):23–33.

Carignan, C., Shosted, R., Shih, C., and Rong, P. (2011). Compensatory articulation in american english nasalized vowels. Journal of Phonetics, 39(4):668 – 682.

Carignan, C., Shosted, R. K., Fu, M., Liang, Z.-P., and Sutton, B. P. (2015). A real-time mri investigation of the role of lingual and pharyngeal articulation in the production of the nasal vowel system of french. Journal of Phonetics, 50(0):34 – 51.

---

### References Continued

Chen, M. Y. (1997). Acoustic correlates of english and french nasalized vowels. The Journal of the Acoustical Society of America, 102(4):2350–2370.

Hawkins, S. and Stevens, K. N. (1985b). Acoustic and perceptual correlates of the non-nasal–nasal distinction for vowels. The Journal of the Acoustical Society of America, 77(4):1560–1575.

Shosted, R., Carignan, C., and Rong, P. (2012). Managing the distinctiveness of phonemic nasal vowels: Articulatory evidence from hindi. The Journal of the Acoustical Society of America, 131(1):455–465.

G. Zellou. Similarity and Enhancement: Nasality from Moroccan Arabic Pharyngeals and Nasals. PhD thesis, University of Colorado at Boulder, 2012.

---

### Other Tables

---

---

---

---

---

### Control vs. Experimental Stimuli

---

### Control Stimuli by Condition

---