Scuse me while I mix up voiced and voiceless-unaspirated stops

This was originally posted on my blog, Notes from a Linguistic Mystic in 2011. See all posts

Only yesterday, I briefly mentioned Mondegreens, where a song lyric is misheard as some other homophonous (identical-sounding) phrase (“killed him and laid him on the green” vs. “killed him and Lady Mondegreen”). This gave me cause to mention Jimi Hendrix’ “Purple Haze” and its famous Mondegreen. The original lyric is:

Purple haze all in my brain
Lately things just don’t seem the same
Actin’ funny, but I don’t know why
’Scuse me while I kiss the sky

But many people hear the last line as “’Scuse me while I kiss this guy”, and that misperception actually reveals something very interesting about how English consonants work.

What makes /k/ different from /g/?

Both /k/ and /g/ are what linguists refer to as “stops”, they’re consonants where the airstream out of the mouth is completely obstructed, and actually, both /k/ and /g/ are “velar” stops, made with the tongue up against the soft palate, or velum. Try it, making a /k/ as in “cap” and a /g/ as in “gap”, one after the other, and you’ll notice that your tongue isn’t changing position when you switch from /k/ to /g/ at all.

The simplistic explanation is that /k/ is a voiceless sound (meaning that our vocal folds/cords aren’t vibrating while we make the closure), and /g/ is a voiced sound, involving glottal vibration during the closure. Unfortunately, like most things in phonetics, it’s not quite that simple or easy.

Voice Onset Time

In reality, stop consonants are classified by their voice onset time, the amount of time that elapses between when the stop is released (when the tongue stops blocking airflow) and when the voicing starts (when the vocal folds start vibrating) for the following vowel. By looking at voice onset time (VOT), we can actually classify consonants in three different ways. (I’ve actually discussed voice onset time before, but now that I’ve already made nicer looking graphics for teaching, it seems worth doing again.)

First, [kʰ]. In English, any voiceless stop that’s at the start of a syllable (so the /k/ in “cap”, but not “pack”) is “aspirated”, meaning that there’s a considerable time gap with a burst of air between the opening of the stop and the start of voicing (it has a positive voice onset time). In the word “cap” /kæp/, we bring our tongue back to the velum to make a closure, we release that closure, and then, around 100 ms (milliseconds) later, we start voicing for the vowel /æ/. Viewed in terms of the acoustical waveform of speech, here’s what aspiration and VOT looks like in [kʰa]:

[g], on the other hand, is a voiced stop, where voicing actually starts during the closure. So, the tongue moves up to the velum, the vocal folds begin vibrating, and then, when the stop is released, the vowel begins immediately. The voice onset time is negative, as the voicing started before the closure. See yet another waveform diagram below, this time showing /ga/:

There’s a third option. Imagine that you started voicing at the exact moment that you released the stop, as shown below:

Then what you have is [k], what linguists refer to as a “voiceless unaspirated stop”, with a voice onset time of 0 (or close to it).

So, we have three stop choices: Voiced stops, voiceless unaspirated stops, and voiceless aspirated stops, which are all used differently in the different languages of the world. But how does this affect Jimi Hendrix?

English makes stops oddly

Our problems with Jimi Hendrix kissing guys (not that there’s anything wrong with that) come from three fundamental oddities in the way that English produces stops.

First, English only distinguishes between Aspirated and Voiced stops. “cap” starts with a /k/, which is produced with aspiration, and “gap” starts with /g/. We don’t have a three way contrast between voiced [g], voiceless unaspirated [k], and voiceless aspirated [kʰ]. Korean, as I’ve mentioned before, has that three way contrast.

Second, English word-initial (at the start of a word) voiced stops are actually produced as voiceless-unaspirated stops, with a VOT of ~0. This is because we, as English speakers, have really strong aspiration in our voiceless stops, so even if we produce something without much voicing during the closure, listeners will still be able to understand that it’s not aspirated, so clearly, the speaker must be intending to express voicing. Here’s a waveform of the word “guy”, to prove the point. Note that there’s a very little VOT here.

Finally, when following an /s/, English voiceless stops are not aspirated. So, in the word “sky”, we have an unaspirated stop, rather than the normal, aspirated [kʰ] which our writing system would lead us to expect. Here’s a waveform showing the very small VOT in “sky”:

So, in effect, the /g/ in “guy” and the /k/ in “sky” are the same sound! Still don’t believe me? Well, first listen to sky, then listen to guy, then listen to “sky” where I’ve digitally removed the /s/. Your writing system has been lying to you!

So what does Jimi Hendrix kissing men have to do with Stop Acoustics?

When we look at the acoustics of “guy” and “sky”, it’s very easy to see that the difference the two different perceptions of the lyric (“kiss the sky” and “kiss this guy”) are incredibly similar. When we realize that in English, [k] and [g] are functionally the same thing, the difference between our two choices:

… is seen to be only a question of where you put the /s/, and thus, really, no difference at all.

So, we see that not only are sounds in English not what our writing systems makes them out to be, but that this “error” of perception is not only understandable, but linguistically fascinating as well.

So, next time you find yourself listening to Purple Haze, Thank Jimi Hendrix for providing one of the best examples of the perceptual troubles which can come from our lack of a voiced/voiceless-unaspirated contrast in the English language. Or, curse me for linguistically corrupting an otherwise good song. Either or, really.