Probability and Language Modeling

Will Styler - LIGN 6

We’ve got corpora now!

Large collections of text
Now we can start to build language models
… but there’s always uncertainty

Today’s Plan

Probability
Surprisal and information
Conditional Probability
Using probability for language modeling
Predictive Text and Swype
Probability and Corpus Size

What is probability?

Probability

The degree of certainty that the value of a variable (or correct answer to a question) is one thing and not another
Humans tend to think about this as ‘certain’, ‘impossible’, ‘likely’, ‘unlikely’
- We express this as a proportion (e.g. 0.46, or ‘46% of the time’)
‘p(event)’ means ‘the probability of event occurring’

Sample Probabilities

p(Heads) from a fair coin: 0.5
p(Heads) from a weighted coin: Not 0.5!
p(6) on a six sided die: 0.16666
p(6) on a twenty sided die: 0.05
p(You winning Powerball): ~0
p(Will wearing gray pants): ~1

Probabilities can be calculated from observation

p(heads) in a coin of uncertain fairness?
p(somebody's wearing a red shirt in class)?
p('yeet') in a corpus?
p('rolex') in a corpus?

Surprisal and Information

Some things in life are surprising

What’s would be a surprise?
What’s would be completely unsurprising?
‘Surprise’ comes when something we judged to be improbable happens
When we’re surprised, we usually gain information about the world

Defining Surprisal and information

Something which is completely certain happening… (P = 1)
- … is completely unsurprising (surprisal = 0)
- … is completely uninformative (information = 0)
Something which happens, despite seeming impossible… (P = 0)
- … is infinitely surprising (surprisal = ∞)
- … is infinitely informative (surprisal = ∞)
Everything else is in the middle (0 < P < 1)

What’s the surprisal of…

‘the’ occurring in an English document
The sun rising tomorrow
Will wearing non-gray pants
Will cancelling the final project and giving everybody A’s
‘mel-frequency cepstral coefficient’ occurring in a document
‘mel-frequency cepstral coefficient’ occurring in a TikTok
Winning Powerball

Now we can quantify how likely a given event is

How surprising it is
… and how informative it is!

We can also estimate our uncertainty

“How likely am I to be surprised here?”
The sun will rise tomorrow
Will will wear gray pants Wednesday
10 heads in a row while flipping a weighted coin
10 heads in a row while flipping a fair coin
The coin landing on edge after being flipped
- (Apparently 1 in 6000 tosses for a nickel!)
The next 20 sided dice roll being 17

Conditional Probability

‘What is the probability of this event, given that this other event occurred?’

p(event|other event) means ‘the probability of an event occurring, given that the other event occurred’

Probabilities are often conditional on other events

What’s p(pun)? What about p(pun|Will)?
What’s p(fire|smoke)? What about p(smoke|fire)?
- This is not (always) symmetrical
What’s p(Will calls in sick)? What’s p(Will calls in sick|he did last class)?
What’s p(heads) on a fair coin? What’s p(heads|prior heads)?
- Probabilities are not always conditional!

Differences in conditional probabilities are information!

Does the change in conditioning event affect the observed probability?
- One event’s probability depends on the other’s!
- If so, there’s an informative relationship!
- Two events have “mutual information” if there’s some relationship
Language modeling is about finding informative relationships between linguistic elements!

Differences in conditional probability let us model language!

p('you'|'how are') vs. p('dogs'|'how are')
p(adjective|'I am') vs. p(noun|'I am')
p(good review | "sucks") vs. p(bad review | "sucks")

Using Probability for language modeling

Knowing probability of any individual word is helpful!

What tasks could be done knowing just the probability of a given word?

Knowing the mutual information of linguistic elements helps to solve problems!

When we do machine learning, we learn a large set of dependent probabilities among linguistic elements!
We’re trying to predict one variable by observing others!

Automatic Speech Recognition

What kinds of probability modeling could help us?
What would we predict? What could we observe?

Text-to-speech

What kinds of probability modeling could help us?
What would we predict? What could we observe?

Spelling correction

What kinds of probability modeling could help us?
What would we predict? What could we observe?

Document classification

“Is this a tweet? Technical report? News article? Sports article? Product Review?”
What kinds of probability modeling could help us?
What would we predict? What could we observe?

Sentiment Analysis

“Does the person talking about this product (e.g.) like it?”
What kinds of probability modeling could help us?
What would we predict? What could we observe?

Studying the racialization of language

What kinds of probability modeling could help us?
What would we predict? What could we observe?

Studying the racialization of language (cont.)

p('the'|article about a white athlete) vs. p('the'|article about a black athlete)?
- We would expect this to be similar
p('wife'|article about a white athlete) vs. p('wife'|article about a black athlete)?
- This is NOT similar, weirdly!
It must be that the use of some words is predictable by athlete race!
- ‘Athlete race is informative for word use!’

The most common use of word probability in our lives…

Predictive text!

Predictive Text and Language Probability

Every word in language is informative about the next

When you read enough sentences, you get a sense of which words follow each other
Imagine somebody’s writing an email…

Hi
How
are
you
doing
today?
I
hope
you
are
penguin
well.

Phrasal information decreases surprisal

Never
gonna
give

Predictive text just formalizes this

“Given the last N words, what is the most likely word?”
- “It’s… on… the… syllabus.”
Some variants will modify guesses based on previous words
- “You’re cute” vs. “Your cat”

Testing the first time and then it is fine but it’s a nice little app but it does have to a lot cheaper than it but it’s fun and it makes it very interesting and it is a great idea for a great time with good people to play for a bit and a great way home fun fun and good way cheaper cheaper and cheaper than a free version for a free app but