All my slides are presented using reveal.js, and do not exist as Powerpoint or PDF, but instead, as websites.
To navigate my slides, use the arrow keys, the onscreen controls, or swipe on mobile. To scan through slides, hit ‘Escape’.
To print or make a PDF, add ?print-pdf to the URL after .html in Google Chrome and then ‘Print’ as usual.
(Please note that PDF/printing is not ‘officially supported’ by the instructor, and the results may be wonky in places)
You can also access a handout version of any slideshow by adding
_handout
right before .html in the slide URL.
So, Handout Link
Introductions
What is this course about?
What is a computer and how does it work?
Why don’t computers speak English?
What will we cover?
What will we ignore?
Instructor, Linguist, Gigantic Nerd
Graduate IA, Linguist
Graduate IA, Linguist
Undergraduate IA, Linguist
What’s your year?
Ling majors?
Monolingual/Bilingual/Multilingual?
What languages do we speak?
What kinds of computers do you use?
Check your Canvas
Come to office hours with questions.
The syllabus will change!
We are here to help!
You’ll notice me restating questions afterwards
You might end up with videos from Winter 2021 if I get sick or Podcast breaks down
Here’s hoping it stays that way
Come to class healthy, masked, and prepared
Please make an effort to talk more loudly when asking questions or answering!
You have no excuse to come to class sick!
Everyone must properly wear face coverings.
Stay current with your COVID-19 testing.
Monitor your symptoms, stay home if you’re sick and report positive cases.
Keep your hands clean, cover your cough and don’t touch your face.
‘Blow off’ the class, or try to lawyer or cajole your way into an A, and you’ll find much no sympathy nor help
Put in the effort for us, and we’ll put in effort for you
If you’re struggling, talk to us ASAP
The CPU (“Processor”)
Memory (L1/L2/L3 Cache, Random Access Memory (RAM))
Stores the values and data that the CPU needs immediate, fast access to
Generally relatively small amounts of storage (e.g. totals in the tens of Gigabytes, or GB)
Slow Storage (“Hard Drive (HD)”, “Solid State Drive (SSD)”, “Disk”)
Stores the data that the CPU might need slower access to
Large amounts of storage (Hundreds of GB to many Terabytes (TB))
The IO
Inputs (Network, Keyboards/Mice, etc.)
Outputs (Network, Graphics/Monitors, Printers, Audio, etc.)
CPU, Memory, and Storage
Networking (e.g. Ethernet, Wifi, Bluetooth)
Graphics Cards (GPUs) (Connects to a monitor or display)
Audio Cards (Connects to speakers and microphones)
Peripheral Connections (e.g. USB, Thunderbolt)
Specialized chips (e.g. motion sensing, touchscreen controller, security chips (TPUs))
Most computers have a “motherboard” which contains USB, Audio, Networking, I/O
Modern phones and some computers have everything on a single board
Your computer
A server in a server farm
Your smartphone or tablet
Your smartwatch
Anything with a processor, memory, and storage
CPU fetches an instruction from memory
CPU figures out what needs to be done with what data
CPU takes the relevant data from the given places in memory and does the desired task
CPU puts the results back into the memory, and fetches the next instruction
Repeat ~100,000,000,000 times per second
Programs are translated (‘compiled’) from a human-readable programming language like Python or C++ into processor instructions
The programs themselves turn these low-level results back into something a human can understand
### C++ |
``` #include |
int main() { cout << “Hello, LIGN 6!”; return 0; } ``` |
# addition
mov ax, 5 # load number in ax
mov bx, 2 # load number in bx
add ax, bx # accumulate sum in ax
# subtraction
mov cx, 10
mov dx, 3
sub cx, dx # accumulate difference in cx
# multiplication - 8 bit source
mov al, 5
mov bl, 10
mul bl # result in ax
# divison - 8 bit source
mov al, 23
mov bl, 4
div bl # quotient in al, remainder in ah
Ben Eater’s ‘Comparing C to Machine Language’
https://www.youtube.com/watch?v=yOyaJXpAYZQ
Programming Languages are designed for computers, not for humans
Commands are designed to be unambiguously converted into logical or mathematical instructions
Limited set of functions and grammatical features
Cannot be changed “on the fly”
Programming languages are not fully productive nor creative
There are many things which cannot be expressed in a programming language
Sentences cannot be unambiguously converted into logical and mathematical instructions
Unlimited set of functions and grammatical features
Can be modified “on the fly”
Fully Productive and Creative
Anything can be said using any human language, given sufficient time and vocabulary
“Add 3 to 5, then check to see if the result is bigger than the number of characters in ‘Laptop’”
“The cat is on the mat”
“I’m going to watch the Office tonight”
“That hurt my dignity, and made me very sad”
“I love you”
Sound waves are not fundamentally accessible to computers
(Nor are visual inputs for signed language, but we’ll focus on speech)
Data encoded by tongue gestures and carried by sound is hard to convert back
We’re amazingly good at it
Computers lack tongues to produce speech (and to percieve it?)
“The cat is on the mat”
“I saw a penguin”
“I saw the Penguin”
“The diplomat is a bachelor.”
“I long for her touch.”
The field dedicated to the computational processing, analysis, and interaction with human language
(NLP also means ‘Neurolinguistic Programming’ in some circles, but it is completely unrelated to computational linguistics)
We’re focusing on Natural Language Processing this quarter!
Virtual Assistants like Siri, Alexa, Google Assistant, or Cortana
They have all of the language problems at once
The most advanced consumer-facing Natural Language Processing around
Other tools like predictive text, dictation, Text-to-Speech
This is a Natural Language Processing course, but that’s our focus
Machine Learning
Speech-to-Text (Automatic Speech Recognition, ASR)
Text-to-Speech (TTS)
Building a Language Model
Natural Language Parsing
Computational Semantics
Computational Pragmatics
How does it work for humans, roughly?
How can we make it work for computers?
What makes doing this really, really hard?
What can we do to break it?
“How do you teach computers to learn?”
“How do computers even work with sound, given that waves aren’t 0 and 1?”
“How do we work with the kinds of tools used in this field?”
“How do we deal with the huge amounts of language data needed to model language?”
“How do we create meaning-annotated data that computers can learn from?”
Speech and Speech Acoustics
Morphology and Syntax
Lexical Semantics
Basic Pragmatics
All of these issues will be present in other languages
We’ll occasionally touch on issues which pertain to other languages
English will provide us with more than enough issues.
We’re also going to side-step machine translation
Signed languages are Language, and merit study
Any one of these topics could be two graduate-level seminars
Think of this like a tasting menu of really hard problems in language and computing
The joys of a lower division class in the quarter system!
You’ll learn to use some phonetic software
You’ll learn some Unix basics, and we’ll see some Python
We’re going to rely on other people’s code (or mine) to make this class work
We’re focusing here on the problems, not the solutions
We’re going to be the linguists in the room, not the engineers
We’re thinking about this schematically, not in detail
“Hey Lowe it’s canary center med tech support just calling to see how your camp experiences calling if you need thing just let me know my phone number or my extension here is 94 it. Is sorry excuse me 49447 again that’s 49447 have a good weekend bye.”
What are the roots that clutch, what branches grow
Out of this stony rubbish? Son of man,
You cannot say, or guess, for you know only
A heap of broken images, where the sun beats,
And the dead tree gives no shelter, the cricket no relief,
And the dry stone no sound of water. Only
There is shadow under this red rock,
(Come in under the shadow of this red rock),
And I will show you something different from either
Your shadow at morning striding behind you
Or your shadow at evening rising to meet you;
I will show you fear in a handful of dust.
What are the roots that clutch, what branches grow
Out of this stony rubbish? Son of man,
You cannot say, or guess, for you know only
A heap of broken images, where the sun beats,
And the dead tree gives no shelter, the cricket no relief,
And the dry stone no sound of water. Only
There is shadow under this red rock,
(Come in under the shadow of this red rock),
And I will show you something different from either
Your shadow at morning striding behind you
Or your shadow at evening rising to meet you;
I will show you fear in a handful of dust.
“If you don’t know, you’re in trouble soon.”
“If you’d like me to remember your wife’s name, just tell me”Remember my wife’s name is Marge”
Historically answered “I would guess your cat’s name is fluffy, or pickles, or is it midnight? Whatever it is, I hope that kitty is doing well.”
Now [Alexa shuts down]
“Traffic to work is light, so it should take 10 minutes via Voigt drive”
This is amazing
We’ll talk about why that is
We’ll about the kinds of tools that they use to get things done
… and where progress remains to be made
Read the syllabus carefully
Activity 1 is on Canvas under ‘Discussions’
We’ll talk a bit about machine learning, and how computers can come anywhere near these problems