Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

320kbps mp3

192kbps mp3

128kbps mp3

Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

64kbps mp3

48kbps mp3

32kbps mp3

8kbps mp3

Friends don’t let friends compress important audio!

The problem is with speech recognition

Will Styler - LIGN 6

First an important program note

today’s plan

Detecting speech is hard

Yoshihisa Ishikawa’s one-night stay at a robot-staffed hotel in western Japan wasn’t relaxing. He was roused every few hours during the night by the doll-shaped assistant in his room asking: “Sorry, I couldn’t catch that. Could you repeat your request?” By 6 am, he realized the problem: His heavy snoring was triggering the robot.




Tools for Defeating noise

Not all noises equal

Noise removal is an boring problem



Two words which are spelled differently, with different meanings, and the same sounds

Example homophones

For this slide, I cheated with the keyboard

On the phones present a major problem for speech recognition

Dealing with homophones

This is a problem that might need a IAI

Vocabulary And limited training data

There are many more words than we can train for

Testing Vocabulary

Names are very hard

“… again, this is Melinda Night, calling for a reference check for Eliza colonoscopy”

Methods of coping with vocabulary issues

Speech Variability

Even for a single person,Speech berries


Producing speech With an unusually Hi clarity and articulation

Hypo articulation

Producing speech with minimal effort And a minimally distinct gestures

your training data dictates the kind of speech you can recognize

Pitch Differences

You sound different at 2 AMThen in class

Speaker variability

People differ substantially in terms of their speech

CaseStudyVal perception

This next section is copied in from another presentation, so no speech-rec errors

Different American English vowels, as spoken by a male speaker

Vowel formants are reflections of articulations

But it’s even worse than it seems…

Moment-to-moment Vowel Variation

Every person you’ve ever talked with has had different vowel formant patterns

How do we accomplish this perceptual magic?

Dealing with vowel variability

Context helps!

The Role of Context

Back to ASR-based lecture-writing. Nooooooo.

ASR systems perform normalization

Sometimes it can be avoided

English vowels different duration

Context can be very very helpful

The more you can predict what is being said,The better

ChildrenAre extra awful

Speaker variability is the biggest problem that ASR faces

Every single user sounds different,But expect the same results

This is absolutely amazing,And terrifying

Wrapping up

For next time