Automatic Part-of-Speech Tagging

Will Styler - LIGN 6

Today’s Plan

We’ve talked about parts of speech already

Lexical Categories

… but these are linguistic, human categories

We also gave you ‘tests’ to use

… but a computer can’t use any of these tests

So, we can’t teach computers to do POS tagging in the same way that we teach humans to!

Preparing for POS Tagging

Before we can automate it, we need to do it with humans

Determining the best tagset

For English…

(Table from Jurafsky and Martin ‘Speech and Language Processing’ 3e)

Annotating a corpus for POS tags

On/IN an/DT exceptionally/RB hot/JJ evening/NN early/RB in/IN July/NNP a/DT young/JJ man/NN came/VBD out/RP of/IN the/DT garret/NN in/IN which/WDT he/PRP lodged/VBN and/CC walked/VBD slowly/RB ,/, as/RB though/IN in/IN hesitation/NN ,/, towards/IN a/DT bridge/NN ./.

All example tagging from today comes from the Stanford Parser

There are many tagged corpora already out there

Once you have a tagset and a corpus, you can use…

Automatic POS Tagging

POS Ambiguity

How much uncertainty there is about the part of speech of a given word

Some words are certain in terms of POS

Some words are only a bit ambiguous in POS

Some words are very ambiguous in POS

Some words have many parts of speech

POS tagging is about resolving this ambiguity

The Stupid Approach: ‘Most Frequent Tag’

Most Frequent Tag Accuracy

Slightly more intelligent: Word form features

… but words come in sequences. We should use that!

HMM-based POS Tagging

Hidden Markov Model

A machine learning process which models a series of observations, with the assumption that there’s some ‘hidden’ state which helps to predict the observations

One major assumption of HMMs

HMMs for POS Tagging

How do we use HMMs for POS-tagging

We need to know two types of probabilities

To get observation probabilities…

Observation probability gets at the idea of ‘POS Ambiguity’

To get Transition probabilities…

Transition probabilities get at the idea that syntax involves sequences of word types

Now we know the probabilities!

We decode the HMM

HMM Decoding: The Basic Idea

So, we have the most likely set of POS tags

One consequence of HMM-based tagging

the/DT three/CD cute/JJ cats/NNS made/VBN will/MD sit/VB back/RP in/IN awe/NN

How does HMM-based POS tagging perform?

… Why only 97% accuracy?

POS Tagging is hard

Use-mention distinctions

Not all words are being used, when being used

‘She said ’bear’ was her favorite word.’

‘Roger texted me ’back’’

‘I bought the The Pianist DVD’

Ambiguous Sentences

Some sentences are actually ambiguous in POS tagging

‘Maria was entertaining last night’

‘I saw the official take from the store.’

‘You should ask a Smith.’

‘I hate bridging gaps.’

Rare or Unknown words

Rare or unknown Words




‘I yeet when I throw empty cans’


‘That phonetics lab meeting was lit’

‘I’m studying English Lit’

‘They lit the beacon of Amon Din to summon the Rohirrim’


Homonyms are (always) a problem

‘I saw the sign’

‘I saw the sign whenever I need to test the cutting feel of a new blade’

‘I bought a saw’

POS Tagging is crucial

POS Tagging is very helpful

Wrapping up

For Next Time

