Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

320kbps mp3

192kbps mp3

128kbps mp3


Sound Compression (Again, Shpongle ‘Nothing is something worth doing’)

Uncompressed WAV

64kbps mp3

48kbps mp3

32kbps mp3

8kbps mp3


Friends don’t let friends compress important audio!


The problem is with speech recognition

Will Styler - LIGN 6


First an important program note


today’s plan


Detecting speech is hard


Yoshihisa Ishikawa’s one-night stay at a robot-staffed hotel in western Japan wasn’t relaxing. He was roused every few hours during the night by the doll-shaped assistant in his room asking: “Sorry, I couldn’t catch that. Could you repeat your request?” By 6 am, he realized the problem: His heavy snoring was triggering the robot.

Source



Noise


Noise


Tools for Defeating noise


Not all noises equal


Noise removal is an boring problem


Homophones


Homophones

Two words which are spelled differently, with different meanings, and the same sounds


Example homophones

For this slide, I cheated with the keyboard



On the phones present a major problem for speech recognition


Dealing with homophones


This is a problem that might need a IAI


Vocabulary And limited training data


There are many more words than we can train for


Testing Vocabulary


Names are very hard

“… again, this is Melinda Night, calling for a reference check for Eliza colonoscopy”


Methods of coping with vocabulary issues


Speech Variability


Even for a single person,Speech berries


Hyperarticulation

Producing speech With an unusually Hi clarity and articulation


Hypo articulation

Producing speech with minimal effort And a minimally distinct gestures


your training data dictates the kind of speech you can recognize


Pitch Differences


You sound different at 2 AMThen in class


Speaker variability


People differ substantially in terms of their speech


CaseStudyVal perception


This next section is copied in from another presentation, so no speech-rec errors


Different American English vowels, as spoken by a male speaker


Vowel formants are reflections of articulations



But it’s even worse than it seems…


Moment-to-moment Vowel Variation





Every person you’ve ever talked with has had different vowel formant patterns


How do we accomplish this perceptual magic?


Dealing with vowel variability


Context helps!


The Role of Context


Back to ASR-based lecture-writing. Nooooooo.


ASR systems perform normalization



Sometimes it can be avoided

English vowels different duration

Context can be very very helpful

The more you can predict what is being said,The better


ChildrenAre extra awful


Speaker variability is the biggest problem that ASR faces

Every single user sounds different,But expect the same results

This is absolutely amazing,And terrifying


Wrapping up


For next time