Morphology for NLP

Will Styler - LIGN 6


Today’s Plan


What is Morphology


Morphology


Words are not monolithic in English


Morpheme

The smallest piece of a word which expresses a meaning or function


Many words are composed of many morphemes


There are two kinds of morpheme


Free Morphemes


Bound Morphemes


Bound Morphemes (or ‘Affixes’)


Morphemes can be tricky for NLP



Stemming

‘Let’s just delete characters from words until all the forms of the word have the same form’


Lemmatization

’Let’s examine this word, look at the many morphological forms it can have, and figure out which lemma it’s a form of.


You’ll want to lemmatize when doing information retrieval


So, those are morphemes


Words


Words are made up of morphemes


Lexical Categories


Lexical Categories


Lexical Categories (Continued)


How do we identify the different parts of speech?


Nouns


Verbs


Adjectives


There are always some exceptions


Adverbs are hard


There are many flavors of adverbs


Prepositions


Pronouns


There are many kinds of pronouns


Determiners


Types of Determiners


Conjunctions


All languages have grammatical categories


We’ll talk about how to automatically detect Part-of-Speech soon


We can group these categories into larger sets


Content and Function words change differently in languages


New function words are pretty rare


New content words are really common


… and existing words can gain new uses


This brings up a major problem for us


Homonym

A word which shares the same spoken and written forms, but has a different meaning


Polysemy

The fact that one word can have many different meanings


Word Sense

The specific meaning of a word being used in a given situation


Many words have multple senses


Fit


Speaking of words having different meanings…


Multi-Word Expressions


So, written words can have many different parts of speech



(Nah, it’s OK, we’ve got syntax)


Wrapping up


Thank you!