What’s your dissertation about, anyways?

This was originally posted on my blog, Notes from a Linguistic Mystic in 2015. See all posts

Much of my time these days is going into writing my doctoral dissertation, the big long paper I have to write before they’ll give me the Ph.D and send me on my way. I’ve had a few people ask me for a concise explanation of what I’m actually doing which is understandable to non-linguists and to readers of all ages, so, here it goes:

The goal of my dissertation, simply put, is to figure out how humans can hear the difference between “pat” and “pant” without resorting to magic.

Where’s the magic in nasality?

Say “pat”. Now say “pant”. Say them again, and listen to the vowel in the middle. Even before you start the “n”, there’s something funky and “nasally” about that vowel, and the “n” isn’t really that strong. That “nasal-ish” difference in sound is called vowel nasality, and it happens when you let some air escape through your nose while you’re making a vowel.

In English, although vowel nasality happens all the time (any time there’s an n, m, or the /ŋ/ sound in “ring”), we as listeners don’t really care whether vowels are nasalized, it’s just something that happens naturally. The word “pant”, said without the nasality, is still “pant”, but it’s a lot easier to make it with the nasality. And it turns out that it’s useful for English speakers to have it there, as it makes decisions about whether we’d heard, say, “cloud” or “clown” a bit easier and faster, especially if you didn’t hear the last part of the word.

In French, though, nasality is crucial to the language. The only difference between “beau” (‘beautiful’) and “bon” (‘good’) is whether the /o/ is nasal or not (nasal or “oral”), so French speakers need to listen for it. There, it’s contrastive, meaning that it can make the difference between different words. Nasality is contrastive in lots of other languages, like Hindi, or Lakota, or Brazillian Portuguese.

So, that’s vowel nasality. We know a bunch about it, and it’s useful to a bunch of speakers of a bunch of languages. The problem is, we as linguists don’t actually know what nasality sounds like.

The Proof is in the measurement

In phonetics, just like in any other sciencey field, we need to be able to measure something to be able to say intelligent things about it. We want to be able to say things like “Based on this study of how they sound, these vowels are more nasal than these other vowels”, or even just “This vowel is nasal, this one isn’t.”

Being able to detect vowel nasality from sound is also useful for non-linguists. It’s good for speech recognition. French Siri badly needs to do better at understanding French speakers. It’s also good for speech pathology. “Hypernasality” is a problem that some people have, where they’re not able to control the amount of air going through their noses during speech, and many things are nasal that shouldn’t be. At the moment, testing for hypernasality involves strapping air masks to people’s heads, and it would be much nicer to just set down a microphone on the table and measure it that way.

Right now, if I want to measure nasality, I’ve got to use a really complicated measurement looking at the strength different frequencies in the signal (higher and lower pitches within the overall sound of the voice). This measure, called “A1-P0”, is great in some ways. If I’ve got 3000 vowels to look at, the measure’s good enough to say things like “Yeah, overall, these vowels over here are more nasal than those over there”. But if I look at any single vowel and ask “Hey, is this oral or nasal?”, it’s got something like a 54% chance of getting the answer right.

But that also points to something awesome: Even though we as linguists aren’t very good at measuring it, humans are REALLY good at hearing nasality. In fact, people are good enough at vowel nasality that languages all over the world have baked it into how they work, and use it every day without any problems. And if people can reliably hear nasality in speech, there must be something to hear, some acoustical feature, which is more reliable for detecting nasality than 54%. In short, we linguists have clearly missed something.

So, the goal of my dissertation, put less simply, is to figure out what, exactly, humans are listening to when they hear the difference between “pat” and “pant” in English, or “beau” and “bon” in French.

How do you do that?

To figure out what we’re actually listening to when we’re hearing nasality, I’ve got a few steps to take.

In short, I need to find cues. Cues are just things that tip you off for perception. Smoke, heat, and light are all cues to fire. I need to figure out what parts of the speech sound signal are cues to nasality.

First, I’m going to first measure a bunch of other acoustical features, different parts of the speech sound signal, that people have said might be a cue for nasality, and see how often they occur in nasal vowels (relative to oral vowels). I’ll also combine some of them, and see if looking at a bunch of features together might be better cues than one thing alone (just like heat alone doesn’t mean fire, but heat and smoke does).

Then, once I’ve got some suspects, some elements of the sound that I think might be useful to humans in noticing nasality, I’ll try and teach a computer to perceive using those features (mostly because computers are much cheaper to experiment on than humans). I’ll give the computer a bunch of data, showing it all the different features I’m thinking about, have it learn from that data, then give it more vowels and ask it to decide whether each vowel is “oral” or “nasal”. By looking at how well the computer did using each individual feature, I’ll be able to narrow down the 30+ features I’m starting with to the ones I know to actually be useful in making decisions about nasality.

Now, I’ll have to get humans involved. I’ll drag a bunch of English speakers into the lab and make them listen to words and make choices (“did you hear”pat” or “pant”?“). But these won’t be just any words, they’ll be words I’ve messed with in some very important ways.

Some of the words will be nasal words (like “pant”) where I’ve removed the parts of the sound which I think make people say “Aha! Nasal!”. My hope is that people, when those parts of the sound are missing, will think that it wasn’t a nasal word after all. If removing a feature makes people think a word isn’t nasal, we’ll know it was important in the perception process, and that it’s a cue.

On the other side, I’ll have oral words (like “pat”) where I’ve added things that I think are nasal cues. My hope is that I can take “dote”, add some features to the signal, and people will hear that added stuff and say “Oh, that’s”don’t”!!“. Those things, we’ll know are really cues, because they’re proof of nasality alone.

By adding and subtracting parts of the sound signal, I’ll figure out what’s necessary for people to hear nasality (What people need to hear to call something that’s nasal, nasal), and what’s sufficient for people to hear nasality (what people need to hear in order to call something nasal). And once I know that, I’ll know what people are actually listening for when they hear nasality.

Then it’s a question of seeing if I can use that knowledge to do nasality measurement (asking a computer to look for the same things humans are), and then saying a bit more about how people are actually hearing the difference. Then, I’ve just gotta write the thing up and convince my committee that it was as awesome as I think it is.

That’s it.

So, that’s my dissertation, and that’s what’s eating my time. I’m hoping to defend it next Spring (Spring 2015), and then, ideally, find a job where I can teach new people about the awesome of nasality, phonetics, Linguistics, and language in general.

But in the mean time, if you happen to be in France, India, or North Dakota, and you overhear native speakers discussing their secret to nasality perception, do me a favor and drop me a line.