Linguistic Problems with Statistic Solutions

- explain view 
- Velum
- IPA explnation
- not letters
- Lean into the pronunciation variation in bulk
- Trim 
- Research question for bulks and why it matters

# Linguistic Problems with Statistic Solutions
Will Styler

<http://savethevowels.org/talks/colloq_ucriverside_2021.html>

---

### Today's Plan

- What is Linguistics, and why?

- The state of statistics in linguistics

- Phonetics and Coarticulation

- Complexity from complex data types

- Complexity from complex questions

- Why is this a problem for our field?

- Why should statisticians and linguists team up more often?

---

# What is Linguistics, and why?

---

### Linguistics is the study of Language

- What is this thing I'm doing right now with my flapping bits of meat around in my head and you then understanding my thoughts?

- How can we describe what languages are doing?

- How can we understand the differences and similarities among them?

- What does language tell us about cognition and culture?

---

### Linguists study languages to understand Language

- Many linguists speak lots of languages, but some don't!

- We're interested in the whole enterprise, and study it scientifically

---

### We break Linguistics into subfields

- "How does talking and understanding speech work?" - Phonetics

- "How do units of sound or gesture change when we combine them?" - Phonology

- "How do we build words?" - Morphology

- "How do we combine words into sentences?" - Syntax

- "How do we understand meaning in language, both generally and in context?" - Semantics and Pragmatics

- "How does this less-well-known language work?" - Lg. Documentation

- ... and many more!

---

### Linguistics is an increasingly experimental discipline

- Some folks still work in armchairs

- ... or in the homes and worlds of language experts

- Theory is now often supported by quantitative or experimental data

- Especially where the patterns are small, variable, or difficult to ferret out

---

### Almost every type of linguistic research has data to analyze

- Text data (e.g. large corpora)

- Survey data (e.g. responses, free text)

- Experimental data (e.g. eye tracking, reaction time, accuracy)

- Neural data (e.g. EEG, fMRI, PET, MEG)

- Imaging data (e.g. video, ultrasound)

- Spatial data (e.g. GIS info, 3D spatial movement tracking)

---

# Statistics in Linguistics

---

## The State of Linguistic Statistics

---

### Most linguists take some basic statistics classes

- "Statistics for Psychology Graduate Students"

- This is often the minimum requirement

- Increasingly more sophisticated classes are available

- "Probabilistic Methods in Linguistics" (an intro to Bayesian stats in our department)

- "Analyzing time series data using Generalized Additive Models" at the Linguistic Institute

---

### There are dedicated resources for statistics for Linguists

> [Baayen, R. H. (2008). Analyzing Linguistic Data: A practical introduction to statistics using R. Cambridge University Press.](https://www.cambridge.org/us/academic/subjects/languages-linguistics/grammar-and-syntax/analyzing-linguistic-data-practical-introduction-statistics-using-r)

> [Winter, Bodo (2020). Statistics for Linguists: An Introduction Using R.  Routledge.](https://www.routledge.com/Statistics-for-Linguists-An-Introduction-Using-R/Winter/p/book/9781138056091)

- ... alongside an increasing corpus of tutorials from statistically focused linguists

---

### There are complex analyses occurring in our field

- Some specializations (e.g. neurolinguistics) require advanced models to function

- Some linguists are statistical thought-leaders and have strong expertise

- Bodo Winter, Harald Baayen, Jacolien van Rij, Martjin Wieling, and more

- Some statisticians moonlight in linguistics (to varying degrees of success)

- "I'm a physicist so I understand how language works..."

---

### But the average linguist is still using relatively unsophisticated models

- The vast majority of linguistic work in these core fields is still supported by more basic methods

- T-Tests and Chi-Square are being phased out in publication
	
	- ANOVA and basic linear models are probably still the mode
	
---

### There's lots of recent movement towards Linear Mixed Effects Regression

- Most experiments have some decidedly random random factors

- Speaker language background differences
	- Differences in vocal tract size
	- Individual word differences (e.g. 'went' vs. 'wend')

- Usually implemented using ``lmer`` in R

- Reviewers are starting to demand mixed models where relevant

- ... but mixed models are right at the edge of many linguists' understanding

- This has led to a saying...

---

> "Giving Linear Mixed Models to Linguists is like giving shotguns to toddlers"

---

### ... but linguists are needing more and more statistical complexity

- Larger and larger text corpora are allowing (and forcing) *massive* analyses

- Interdisciplinary work often inherits the toolchains of related methods

- New experimental methods require new technology to process it

- More nuanced questions require more nuanced examinations

---

### We're going to look at those last two

- Complex data requiring complex analysis

- Nuanced questions requiring nuanced analysis

- We're going to examine both in the context of linguistic phonetics

---

# Coarticulation in Phonetics

---

### I'm a phonetician

- My focus is on understanding exactly what's happening in the mouth when we talk

- "What are you doing inside your body to produce this word?"

- "How are listeners able parse or reconstruct that to understand that you've produced this word"

- ... and we're going to focus on some phonetic questions today

---

### Studying gestures

- Speech can be defined as a sequence of gestures of the tongue, lips, larynx and other speech articulators

- Gestures of the tongue and mouth are the smallest units of spoken language

- Gestures are likely the object of human speech perception

- *Both of the claims above could cause a fistfight at a conference, but let's hold them as true for this talk.*

---

### Gestures aren't cleanly separable

- We write letters one after the other, but letters are lies

- The lines between gestures tend to blur
	
	- Speech sounds are **not** beads on a string
	
- We often begin moving our articulators towards the next gesture before we've finished the current one

- ... and the last sound can often have an influence on the current one
	
- This overlap is called **coarticulation**

- A nice example: 'car key'

---

### Coarticulation is easier when speaking

- "Car key" is changing the articulation of one sound to better 'match' the next

- We will often start to articulate the /l/ in words like 'bulk' before we've finished the vowel

- Air starts flowing out the nose in words like 'bend' before we actually make the /n/ sound where it's supposed to

---

### Coarticulation is helpful for perception too

- It provides redundancy in signaling speech contrasts

- It provides information about upcoming sounds *before they arrive*

- It can help to reconstruct 'missing' sounds

---

### Phonetics has a big problem

- So we want to learn more about the gestures we're making, and how they overlap

- We want to see exactly which gestures are happening inside your head and when

- ... but your head is frustratingly opaque

---

### Phonetics has examined gestures acoustically for a long time

- First by ear training, now using DSP and frequency-domain analysis

- We've often focused on finding quantifiable acoustic measures which covary with the articulatory properties under study

- "This measure represents the height of the tongue in the mouth"

- Other methods of measuring articulator motion and position do exist

- Imaging of tongue motion and position is ideal!

---

### ... but when we look inside the head, we find...

---

# Complexity from Complex Data

---

### Ultrasound Imaging

- Pulse high-frequency sound waves into the body

- Measure the patterns in which they return to image internal structure

- The resulting data are black and white image frames showing areas of high and low reflection

---

### Ultrasound Data Acquisition

---

### Sample Speech Ultrasound file

---

### Ultrasound in Speech

- Captures the motion of the tongue in (generally) two dimensions

- 3D Ultrasound exists, but is rare in Linguistics still
	
- Offers 60+ frames per second time resolution

- Ideal for tracking the *relative location* and *contour* of the tongue and
	
---
	
### Ultrasound 'Splining'

- The machine outputs a series of images (or grayscale matrices) at a fixed sampling rate

- We transform images into lists of ordered points representing the tongue shape and location

- This is done by the researcher and team directly

- ... or using [neural networks](https://arxiv.org/abs/1907.10210)
	
---

</section>

---

---
### Technical Notes

- There are some approaches which use PCA on whole-frame images to isolate meaningful components and skip this process (c.f. [Faytak et al. 2020](https://www.journal-labphon.org/article/id/6281/))

- There are many problems with normalizing position and orientation between speakers and words which are not discussed here but which are Fun™

---

### This splined data gives us details about articulation

- What is the average/min/max height of the tongue?

- "Is the vowel in 'beet' generally higher than the vowel in 'bit'?"
	
- What's the front-back distribution of the tongue?

- "Is the vowel in 'boot' really as far back as in 'boat' for Californians?"
	
- How do tongue contours differ between sounds?

- "Do we shape the tongue differently for 'buck' and 'bulk'?"
	
- How do tongue contours change during sounds?

- "At what point does the tongue start moving towards the /l/ gesture in 'bulk'?"

---
	
### Getting front-back-high-low distribution is relatively easy
	
<img class="r-stretch" src="phonmedia/ultrasound_splined.jpg">

---
	
### Does the tongue shape differ for 'buck' vs. 'bulk'?
	
<img class="r-stretch" src="phonmedia/ultrasound_vowel.jpg"> <img class="r-stretch" src="phonmedia/ultrasound_lateral.jpg">

---
	
### Comparing Contours is difficult (for us)
	
- Usually done using Smoothing Spline ANOVA in Linguistics
	
- Occasionally mixed models with B-Splines, Generalized Additive Models (GAM), and Growth Curves
	
<img class="r-stretch" src="phonmedia/ultrasound_vowel.jpg"> <img class="r-stretch" src="phonmedia/ultrasound_lateral.jpg">

---

### At what point does the tongue start moving towards the /l/ gesture in 'bulk'?

- This is a place where speakers vary

- We can look at the time course of the vowel+l portion of the word

---

### Some people show some change later on
	
<img class="r-stretch" src="phonmedia/ultrasound_bulk_somechange.jpg">
	
---
	
### Some people have have massive change early on
	
<img class="r-stretch" src="phonmedia/ultrasound_bulk_bigchange.jpg">

---

### Some people don't show change at all
	
<img class="r-stretch" src="phonmedia/ultrasound_bulk_nochange.jpg">

---

### Measuring these changes is very difficult (for us)
	
- Quantifying the degree of change in a 50 point spline which changes contour and position over time

- Variably, across speakers
	
- Identifying the *onset* of the contour change in time
	
- Identifying specific types of contour change which are most relevant
	
	- Finding 'targeted' vs 'untargeted' change
	
- **There isn't a well-established statistical method for doing this in our field!**

---

### In practice, this line of inquiry wasn't possible

- Not because it is impossible, but because the myriad complexities

- ... as well as some interesting linguistic details which we don't have time for
	
---
	
### "Wait... hold on..."
	
- "People differ in the amount and timing of change...?"
	
---

---

---
	
### "Why do people differ in their patterns of coarticulation?"
	
---
	
# Complexity from Complex Questions
	
---
	
### Background: Nasal Coarticulation

- /n/ is a 'nasal' sound, with airflow from the nose

- This is accomplished by lowering the 'velum'
	
<img class="r-stretch" src="phonmedia/sag_alveolar.jpg"><img class="r-stretch" src="phonmedia/sag_nasal.jpg">
	
---
	
<huge>bend</huge><br>

* **...but there's more to it than the symbols show us!**

* In the word "bend", we start nasal airflow before the nasal /n/, *during the vowel*

---

### This is audible and useful to us

- Is this 'bob' or 'bomb'?

- **We use can use coarticulation to tell what the upcoming word will be more quickly!**

---

### We can measure nasal coarticulation by measuring airflow from the mouth and nose

- This is called 'pneumotachography'

---

### Airflow measurement gives us curves

- Oral and nasal flow in mL/sec

- Sampled (here) at 50 points through the vowel

---

### The word 'bed' has no nasal airflow

---

### The word 'bend' is more complicated

---

### The /b/ has no nasal flow

---

### The /n/ has lots of nasal flow and little oral flow

---

### The vowel in the middle shows coarticulation

---

### Looking at airflow we can see coarticulation directly

- Both the *amount* of flow and the *timing* of the flow

---

### Some speakers show only a bit of coarticulation

---
### Some speakers show only a bit of coarticulation

---

### Some speakers show moderate coarticulation

---

### Some speakers show massive coarticulation

---

### Some speakers show massive coarticulation

---

### Speakers differ greatly in their *production* of coarticulation

- Ranging from 'practically none' to 'it's all nasal'

- Inference can be done using splined mixed models, GAMs, and more

- Functional data analysis isn't common in Linguistics, but it does happen!
	
---

### If speakers vary in their production of coarticulation

- Do they differ in their *perception* of coarticulation as well?

---

### Measuring the Perception of Coarticulation

- Often done using eyetracking

- "When does the participant look at the correct image on the screen?"

- "Does this person use vowel nasality to choose 'send' over 'said' more quickly?"

---

### Visual World Eyetracking

---

### Eye Tracking Data

- For each trial, 1000 binary points over the course of a second, 'Are they looking at the nasal word?'

- 0000000000000001111111111...
	
	- Occasionally 00000000000000011111111110000000...

- Many, many trials are averaged out to create response curves

- "Generally speaking, does this person make a choice earlier in this condition than that one?"

---

### Conditions

- "Early Nasalization": Coarticulation begins very early in the vowel

- "Late Nasalization": Coarticulation begins later in the vowel

- *How early is information about the word made available to listeners?*

---

### Listeners can be compared on the basis of their use of nasality

- People who use coarticulation strongly in perception will decide 'send' over 'said' earlier for 'early' nasalization tokens

- People who don't use coarticulation in perception will show little distinction between the conditions

---

### Listeners who use coarticulation

---

### Listeners who use coarticulation

---

### Listeners who largely ignore coarticulation

---

### Listeners who largely ignore coarticulation

---

### So, now we can measure perception of coarticulation

- ... and production

- This allowed us to ask one very large question...

---

### Is a listener's production of coarticulation related to their perception of coarticulation?

- Put differently, do people who coarticulate early, listen for it early?

- *Do people who talk unusually expect others to talk the same way?*
	
- This was tested in [Beddor et al. 2018](https://muse.jhu.edu/article/712563)
	
---

### This is a surprisingly useful question

- It gets at the heart of the gesture vs acoustics debate in speech perception

- It tells us about the role of our own productions in guiding our learning of a language

- It has massive implications for how languages change over time

---

### But it's really, really unpleasant to test

- Correlating a functional airflow curve (with massive variation in values) with the overall trend across a large set of logistic time series from eye tracking trials

- We have truly random factors we want to get rid of

- Variation in frequency and 'lookability' across words

- Some speaker factors we want to get rid of

- Variation in pre-look processing time, absolute differences in airflow volume
	
- Other speaker factors we want to study

- Variation in time-to-look by condition, variation in flow slope and time onset

- We're interested in speaker variation, but the experiment was so complex that we could only collect 42 participants

- **Yikes**

---

### We needed help

- Help came in the form of [Kerby Shedden](https://sph.umich.edu/faculty-profiles/shedden-kerby.html), University of Michigan Department of Statistics

---

### We ended up collapsing the airflow data using PCA

- This gave us a single quantity representing timing and degree of coarticulation ('PC2') which we could insert into a model of perception

- The perception model was run using ``mcmcglmm`` in R, with b-splines to model temporal variation

---

### Turns out that people who produce early coarticulation generally listen for early coarticulation

(Adapted from Beddor et al 2018)

---

### Work is ongoing to continue investigating these issues

- The production/perception link is very interesting, and uniformly hard to analyze

- ... and there are a million other domains to test it in

---

### These cases illustrate the sorts of complexity which we've found ourselves wandering into

- ... and analogous issues exist in *every* subfield of linguistics

---

# Why is this a problem for our field?

---

### Increasingly complex data has pulled us into complex territories

- We've moved from single variable correlations into functional data

- ... and in many cases, functional data which is itself captured as a time series

- New methods are arriving

- ... but our questions are generally different enough that existing statistical toolchains don't cleanly apply

- Our data keep getting richer and bigger

- The burden of 'proof' is rising as available data to test is rising

---

### Increasingly complex questions require increasingly nuanced analyses

- We've moved from presence/absence into time course information

- We're now increasingly studying the kinds of variability which conventional models attempt to factor out

- Potentially explanatory data is seldom low-dimensional!

---

### Our statistical needs have surpassed our statistical abilities

- Grad level Psych Stats has very little to say about comparing 3D meshes of tongue motion by conditions

- This poses a massive pedagogical problem!

- Reviewers are generally chosen for knowledge of specific linguistic domains (e.g. coarticulation or French nasality), and have vastly variable statistical backgrounds

- "Why not just use an ANOVA here?" is as likely as "How did you settle on the right number of spline coefficients?"

- Keeping up with the statistical state-of-the-art is a full-time job, and it's very easy to miss things

- ... so those of us who try to learn more about complex analyses often remain toddlers with even bigger shotguns
	
---

### That's why I'm here today

- (That and Shuheng's gracious invitation)

---

# Linguists and Statisticians should talk more!

---

### Language is uniquely rewarding as an area of research

- You are quite literally always using language

- Problems are often interpretable in terms of linguistic experience

- It offers a diversity of data types, often in the same experiments

- Text data, behavioral experiments, sensor output, imaging data, GIS, and more
	
- Linguistic knowledge is helpful for breaking into Natural Language Processing, and other language-focused data science

- Everything I've talked about today has straightforward applications in speech recognition and text-to-speech
	
---

### Many linguists held back by lack of knowledge of statistical intricacy

- Increasing number of questions have small and hard-to-model effects

- "I want to study this, but I don't know how I'd model it"

- It's very possible that 'straightforward' techniques in statistics could be revolutionary in our field

- Many of us feel limited by our tools more than our questions
	
- Collaborations can be mutually rewarding and mutually beneficial

- Linguists learn stats, statisticians learn language
	
---

### Our field is just realizing this need

- There's a growing understanding that we probably shouldn't claim to be masters of two disciplines at once

- There is increasing discussion of hiring statisticians in departments and divisions for consulting and collaboration

- ... and already, statistical saavy is a common desired trait for new hires

- Statisticians who know even basic elements of language will be increasingly valued in industry and life

---

### Teamwork can make the dream work

- Linguistic work is often held back by relatively basic inference approaches

- Increased complexity of data, and increased complexity of questions, both leave ample room for collaboration

- New methods in statistics likely have testable uses in language

- New questions in linguistics may require new methods in statistics

- And people collaborating in this world have a very real chance to make a difference in both fields

---

### Let's talk!

- Next time you're looking to branch out, remember that we linguists are here

- That we've got amazing data
	
- ... and at the very least, you can use your knowledge to help teach a toddler proper statistical safety

---

<huge>Thank you!</huge>

Questions? <wstyler@ucsd.edu>
 
<!-- Linguistic Problems with Statistic Solutions

In this talk, my goal is to briefly introduce a statistical audience to some of the particularly interesting types of data, hypotheses, and open questions found in Phonetics, the subfield of Linguistics dedicated to the study of speech and speech perception. We'll discuss the state of the field, and then look at two case studies, the analysis of ultrasound data of tongues, and the production-perception loop in speech and perception. In doing so, we'll discuss some of the myriad difficulties left for linguists, and highlight areas which may prove fertile ground for collaborative statistical research.

-->