# Linguistic Problems with Statistic Solutions Will Styler
--- ### Today's Plan - What is Linguistics, and why? - Statistics in linguistics - Phonetics and Coarticulation - A Coarticulatory Measurement Success Story - A Coarticulatory Measurement Failure - Why am I here? --- # What is Linguistics, and why? --- ### Linguistics is the study of Language - What is this thing I'm doing right now with my flapping bits of meat around in my head and you then understanding my thoughts? - How can we describe what languages are doing? - How can we understand the differences and similarities among them? - What does language tell us about cognition and culture? --- ### Linguists study languages to understand Language - Many linguists speak lots of languages, but some don't! - We're interested in the whole enterprise, and study it scientifically --- ### We break Linguistics into subfields - "How does talking and understanding speech work?" - Phonetics - "How do units of sound or gesture change when we combine them?" - Phonology - "How do we build words?" - Morphology - "How do we combine words into sentences?" - Syntax - "How do we understand meaning in language, both generally and in context?" - Semantics and Pragmatics - "How does this less-well-known language work?" - Lg. Documentation - ... and many more! --- ### Linguistics is an increasingly experimental discipline - Some folks still work in armchairs - ... or in the homes and worlds of language experts - Theory is now often supported by quantitative or experimental data - Especially where the patterns are small, variable, or difficult to ferret out --- ### Almost every type of linguistic research has data to analyze - Text data (e.g. large corpora) - Survey data (e.g. responses, free text) - Experimental data (e.g. eye tracking, reaction time, accuracy) - Neural data (e.g. EEG, fMRI, PET, MEG) - Imaging data (e.g. video, ultrasound) - Spatial data (e.g. GIS info, 3D spatial movement tracking) --- # Statistics in Linguistics --- ## The State of Linguistic Statistics --- ### Most linguists take some basic statistics classes - "Statistics for Psychology Graduate Students" - This is often the minimum requirement - Increasingly more sophisticated classes are available - "Probabilistic Methods in Linguistics" (an intro to Bayesian stats in our department) - "Analyzing time series data using Generalized Additive Models" at the Linguistic Institute --- ### There are dedicated resources for statistics for Linguists > [Baayen, R. H. (2008). Analyzing Linguistic Data: A practical introduction to statistics using R. Cambridge University Press.](https://www.cambridge.org/us/academic/subjects/languages-linguistics/grammar-and-syntax/analyzing-linguistic-data-practical-introduction-statistics-using-r) > [Winter, Bodo (2020). Statistics for Linguists: An Introduction Using R. Routledge.](https://www.routledge.com/Statistics-for-Linguists-An-Introduction-Using-R/Winter/p/book/9781138056091) - ... alongside an increasing corpus of tutorials from statistically focused linguists --- ### There are complex analyses occurring in our field - Some specializations (e.g. neurolinguistics) require advanced models to function - ... but often have overfit to those models - Some linguists are statistical thought-leaders and have strong expertise - Bodo Winter, Harald Baayen, Jacolien van Rij, Martjin Wieling, and more - Some statistically saavy moonlight in linguistics (to varying degrees of success) - "I'm a physicist with a neat model so I now understand how language works..." --- ### But the average linguist is still using relatively unsophisticated models - T-Tests and Chi-Square are being phased out in publication - ANOVA and basic linear models are probably still the mode - There's lots of recent movement towards Linear Mixed Effects Regression - Speaker, word, and ordering differences are nicely handled by random effects - Reviewers are starting to demand mixed models where relevant - ... but mixed models are right at the edge of many linguists' understanding - This has led to a saying... --- ## "Giving Linear Mixed Models to Linguists is like giving shotguns to toddlers" --- ### There's a danger in escalating complexity without escalating education - We need more nuanced methods for more nuanced questions - ... but it's hard to find the right path ahead - We need new statistical methods for new experimental methods - ... but we seldom have the time or know-how to develop them --- ### We're going to look at these ideas in practice - ... by examining two case studies from phonetics --- # Coarticulation in Phonetics --- ### I'm a phonetician - My focus is on understanding exactly what's happening in the mouth when we talk - "What are you doing inside your body to produce this word?" - What do the 'speech gestures' of the tongue and mouth look like? - "How are listeners able parse or reconstruct that to understand that you've produced this word" - ... and we're going to focus on some phonetic questions today --- ### Speech isn't cleanly separable - We write letters one after the other, but letters are lies - We're more concerned with tongue gestures, which aren't separable - Speech sounds are **not** beads on a string - We often begin moving our articulators towards the next gesture before we've finished the current one - ... and the last sound can often have an influence on the current one - This overlap is called **coarticulation** - A nice example: 'car key' --- ### Coarticulation is easier when speaking - "Car key" is changing the articulation of one sound to better 'match' the next - Air starts flowing out the nose in words like 'bend' before we actually make the /n/ sound where it's supposed to - We might raise the tongue tip 'L' in 'bulk' earlier, both for efficiency and because... --- ### Coarticulation is helpful for perception too - It provides redundancy in signaling speech contrasts - It provides information about upcoming sounds *before they arrive* - It can help to reconstruct 'missing' sounds --- ### Coarticulation is important - We want to know how people produce it - ... to be able to model speech - We want to know how people perceive it - ... to be able to model listening - So, let's look at two specific instances of it --- # A Coarticulatory Measurement Success Story --- ### Nasal Coarticulation - /n/ is a 'nasal' sound, with airflow from the nose - This is accomplished by lowering the 'velum'
---
bend
* **...but there's more to it than the letters show us!** * In the word "bend", we start nasal airflow before the nasal /n/, *during the vowel* --- ### This is audible and useful to us - Is this 'bob' or 'bomb'?
- **We use can use coarticulation to tell what the upcoming word will be more quickly!** --- ### We can measure nasal coarticulation by measuring airflow from the mouth and nose - This is called 'pneumotachography'
--- ### Airflow measurement gives us curves - Oral and nasal flow in mL/sec - Sampled (here) at 50 points through the vowel --- ### The word 'bed' has no nasal airflow
--- ### The word 'bend' is more complicated
--- ### The /b/ has no nasal flow
--- ### The /n/ has lots of nasal flow and little oral flow
--- ### The vowel in the middle shows coarticulation
--- ### Looking at airflow we can see coarticulation directly - Both the *amount* of flow and the *timing* of the flow --- ### Some speakers show only a bit of coarticulation
--- ### Some speakers show only a bit of coarticulation
--- ### Some speakers show moderate coarticulation
--- ### Some speakers show massive coarticulation
--- ### Some speakers show massive coarticulation
--- ### Speakers differ greatly in their *production* of coarticulation - Ranging from 'practically none' to 'it's all nasal' - Inference can be done using splined mixed models, GAMs, and more - Functional data analysis isn't common in Linguistics, but it does happen! --- ### If speakers vary in their production of coarticulation - Do they differ in their *perception* of coarticulation as well? --- ### Measuring the Perception of Coarticulation - Often done using eyetracking - "When does the participant look at the correct image on the screen?" - "Does this person use vowel nasality to choose 'send' over 'said' more quickly?" --- ### Visual World Eyetracking
--- ### Eye Tracking Data - For each trial, 1000 binary points over the course of a second, 'Are they looking at the nasal word?' - 0000000000000001111111111... - Occasionally 00000000000000011111111110000000... - Many, many trials are averaged out to create response curves - "Generally speaking, does this person make a choice earlier in this condition than that one?" --- ### Conditions - "Early Nasalization": Coarticulation begins very early in the vowel - "Late Nasalization": Coarticulation begins later in the vowel - *How early is information about the word made available to listeners?* --- ### Listeners can be compared on the basis of their use of nasality - People who use coarticulation strongly in perception will decide 'send' over 'said' earlier for 'early' nasalization tokens - People who don't use coarticulation in perception will show little distinction between the conditions --- ### Listeners who use coarticulation
--- ### Listeners who use coarticulation
--- ### Listeners who largely ignore coarticulation
--- ### Listeners who largely ignore coarticulation
--- ### So, now we can measure perception of coarticulation - ... and production - This allowed us to ask one very large question... --- ### Is a listener's production of coarticulation related to their perception of coarticulation? - Put differently, do people who coarticulate early, listen for it early? - *Do people who talk unusually expect others to talk the same way?* - This was tested in [Beddor et al. 2018](https://muse.jhu.edu/article/712563) -
Beddor, P.S., Coetzee, A.W., Styler, W., McGowan, K.B., & Boland, J.E. (2018). The time course of individuals’ perception of coarticulatory information is linked to their production: Implications for sound change. _Language_ _94_(4), [doi:10.1353/lan.2018.0071](http://doi.org/10.1353/lan.2018.0071)
--- ### This is a surprisingly useful question - It gets at the heart of the gesture vs acoustics debate in speech perception - It tells us about the role of our own productions in guiding our learning of a language - It has massive implications for how languages change over time --- ### But it's really, really unpleasant to test - Correlating a functional airflow curve (with massive variation in values) with the overall trend across a large set of logistic time series from eye tracking trials - We have truly random factors we want to get rid of - Variation in frequency and 'lookability' across words - Our research question involves speaker variation in time-to-look by condition and variation in flow slope and time onset - ... but we want to control for speaker variation in pre-look processing time, absolute differences in airflow volume - We're interested in speaker variation, but the experiment was so complex that we could only collect 42 participants --- # Yikes - We needed help --- ### Help [Kerby Shedden](https://sph.umich.edu/faculty-profiles/shedden-kerby.html) from University of Michigan Department of Statistics
--- ### We ended up collapsing the airflow data using PCA - This gave us a single quantity representing timing and degree of coarticulation ('PC2') which we could insert into a model of perception - The perception model was run using ``mcmcglmm`` in R, with b-splines to model temporal variation --- ### Turns out that people who produce early coarticulation generally listen for early coarticulation
(Adapted from Beddor et al 2018) --- ### Work is ongoing to continue investigating these issues - The perception/production link can be examined in *many* speech phenomena - What if we tested the perception/production link in another domain? - **Let's see if we can correlate tongue movement with perception!** --- # A Coarticulation Measurement Failure --- ### How does the middle part of 'bulk' work? - For some people, there's the vowel in 'buck', then an L sound, with minimal overlap - For others, the L is strongly **coarticulated** with the vowel - And for many, there's just one steady mixed "ul" vowel - This is an interesting question for speech perception, speech recognition, and text-to-speech - **How can we tell who's doing which of these things, quantitatively?** - ... and can we predict who does what? --- ### Ultrasound Imaging - Pulse high-frequency sound waves into the body - Measure the patterns in which they return to image internal structure - The resulting data are black and white image frames showing areas of high and low reflection --- ### Ultrasound Data Acquisition
--- ### Sample Speech Ultrasound file
--- ### Ultrasound in Speech - Captures the motion of the tongue in (generally) two dimensions - Offers 60+ frames per second time resolution - Ideal for tracking the *relative location* and *contour* of the tongue and --- ### Ultrasound 'Splining' - The machine outputs a series of images (or grayscale matrices) at a fixed sampling rate - We transform images into lists of ordered points representing the tongue shape and location - This is done by the researcher and team directly - ... or using [neural networks](https://arxiv.org/abs/1907.10210) - Or full-frame PCA is used to look for changes (c.f. [Faytak et al. 2020](https://www.journal-labphon.org/article/id/6281/)) - There are other complexities I'm skipping here! ---
---
--- ### This splined data gives us details about articulation - How do tongue contours differ between sounds? - "Does the tongue shape differ for 'buck' and 'bulk'?" - How do tongue contours change during sounds? - "At what point does the tongue start moving towards the /l/ gesture in 'bulk'?" - ... or is there no change at all for this speaker? --- ### Does the tongue shape differ for 'buck' vs. 'bulk'?
--- ### Comparing Contours is difficult (for us) - Averages and center of gravity can give some clarity - Usually done using Smoothing Spline ANOVA in Linguistics - Occasionally mixed models with B-Splines, Generalized Additive Models (GAM), and Growth Curves
--- ### At what point does the tongue start moving towards the /l/ gesture in 'bulk'? - This is a place where speakers vary - We can look at the time course of the vowel+l portion of the word --- ### Some people show some change later
--- ### Some people have massive change early on
--- ### Some people don't show change at all
--- ### Measuring these changes is very difficult (for us) - Quantifying the degree of change in a 50 point spline which changes contour and position over time - Variably, across speakers - Identifying the *onset* of the contour change in time - Identifying specific types of contour change which are most relevant - Finding 'targeted' vs 'untargeted' change - **There isn't a well-established statistical method for doing this in our field!** --- ### We had hoped to continue down this road - Also to recreate the perception/production study above with these data - **This is a very hard analysis problem, and we weren't up to the task** - *... and we also had trouble finding enough people with changing vowels* --- ### This is a case where our ability to analyze hindered our ability to describe - We're finding more of those all the time! - Which helps answer the question... --- # Why am I here? --- ### Increasingly complex data has pulled us into complex territories - We've moved from single variable correlations into functional data - In many cases, functional data is itself captured in a time series - New methods are arriving for us every day - ... but our questions are generally different enough that existing statistical toolchains don't cleanly apply --- ### Increasingly complex questions require increasingly nuanced analyses - We've moved from presence/absence into time course information - We're now increasingly studying the kinds of variability which conventional models attempt to factor out - Potentially explanatory data is seldom low-dimensional! --- ### Our statistical needs have surpassed our statistical abilities - Grad level Psych Stats has very little to say about comparing 3D meshes of tongue motion by conditions - This poses a massive pedagogical problem! - Reviewers are generally chosen for knowledge of specific linguistic domains, with varying statistical saavy - "Why not just use an ANOVA here?" or "How did you settle on the right number of spline coefficients?" - Keeping up with the statistical state-of-the-art is a full-time job - So us statistically-interested linguists are just toddlers with bigger shotguns --- ### That's why I'm here today - (That and Shuheng's gracious invitation) --- ## Linguists and Statisticians should talk more! --- ### Language is uniquely rewarding as an area of research - You are quite literally always using language - Problems are often interpretable in terms of linguistic experience - It offers a diversity of data types, often in the same experiments - Text data, behavioral experiments, sensor output, imaging data, GIS, and more - Linguistic knowledge is helpful for breaking into Natural Language Processing, and other language-focused data science - Everything I've talked about today has straightforward applications in speech recognition and text-to-speech --- ### Collaborations can be very fruitful! - We often have questions with more nuance than the techniques we know - Basic-to-you statistical approaches may enable advanced-for-us questions! - It's very possible that new techniques in statistics could allow revolutionary questions in our field - Often, new analytical approaches provide new avenues for research - Collaborations can be mutually rewarding and mutually beneficial - Linguists learn Stats, Statisticians learn Language --- ### This is a potential growth area in our field - There is increasing discussion of hiring statisticians in departments and divisions for consulting and collaboration - ... and already, statistical saavy is a common desired trait for new hires - Statisticians who know even basic elements of language will be increasingly valued in industry and life - Natural Language Processing is Statistics and Machine Learning about language --- ### Teamwork can make the dream work - Linguistic work is often held back by relatively basic inference approaches - Increased complexity of data, and increased complexity of questions, both leave ample room for collaboration - New methods in statistics likely have testable uses in language - New questions in linguistics may require new methods in statistics - And people collaborating in this world have a very real chance to make a difference in both fields --- ### Let's talk! - Next time you're looking to branch out, remember that we linguists are here - That we've got amazing data - ... and that at the very least, you may be able to help keep a toddler safe! ---
Thank you!
Questions?