--- # Conversational Interaction Design ### Will Styler - LIGN 6 --- ### Today, we'll talk about talking to the computer - Not from an NLP standpoint, but from a conversation standpoint - "Assuming the rest of the system is working OK, what's conversation like?" --- ### Today's Plan - The basic structures of human conversation - Why even the basics are hard - Other Conversational Difficulties - How human do we want these systems to be? --- # Human Conversation --- ## Conversation Analysis A methodology dedicated to analyzing the linguistic, social, and cultural elements of conversation --- ### CA research starts from data - Record a conversation - Create transcripts and annotated information - Then study the processes at work - They've come up with some basic structures of conversation --- ### Each and every element of human conversation is hard for these systems - Turn taking - Adjacency Pairs - Sequence Expansion - Repair --- ## Turn Taking --- ### Turn Taking is important - In conversation, somebody talks - Then another person starts talking at a "transition relevance point" (TRP) - The next talker is determined by... - The Current speaker continuing - The Current speaker designating the next - Self-selection by a listener - *How do listeners self-select?* --- ## Adjacency Pairs --- ### Adjacency Pairs are 'Paired' turns where there is an expected conversational course - Command/Action - Question/Answer - Offer/(Acceptance|Refusal) - Compliment/Response - Disbelief/Affirmation --- ## Sequence Expansions --- ### Sequence expansions allow multiple adjacency pairs to act towards one greater action - Pre-Expansion:
"Hey Raul?"
"Yeah, what's up?"
"Could you hand me that mug?"
"Sure" - Insert Expansion:
"Table for two please?"
"Inside or outside?"
"Outside"
"Right this way!" - Post-Expansion:
"Wait, final projects are due next week?"
"Yeah!"
"OMG!!!!"
- Silence-as-expansion:
"So, how's my hair look?"
"...." --- ## Repair --- ### We screw up when talking, and repair fixes it - Self-initiated Self-Repair -
"I'm going to buy a new keyboard... errr... mouse"
- Other-inititated Self-Repair -
"I wore blue pants yesterday"
"Wait, really?!"
"Oh, my bad, I meant gray"
- Self-Initiated Other Repair -
"So, this is what we call... uh... come on. Quantitative Implicature? Eric?"
"Scalar Implicature?"
"Scalar! Thank you!"
- Other-Initiated Other Repair -
"Let's watch the new Transformers movie, that's a great movie!"
"No, you mean "terrible" movie" --- ## How do these things go for Alexa? --- ### Turn-taking is tough - Knowing when the human is done is hard - 1234 5678 9101 1121 - Humans talking in noise - "End of Speech Timeout" - Non-verbal cues are missing - Managing interruption is hard - "Barge-in" is the technical term --- ### Alexa is focused on adjacency pairs - "Alexa, turn on the light -
"OK"
- "Alexa, can you turn on the lights?" -
"OK"
- "Alexa, would you like a cookie?" -
"I'm sorry, I don't know that."
--- ### Alexa Adjacency Continued - "Alexa, what time is it?" -
"It's 6:41pm"
- "Alexa, you're really smart" -
"Thanks!"
- "Alexa, can you believe it's week 10 already?" -
"Sorry, I'm not sure"
--- ### Alexa doesn't even try sequence expansion - *She stops listening after a single adjacency pair!* - "Alexa, guess what?" -
"I have no idea" (*stops listening*)
- "Alexa, I want to buy some pants" -
"I found [the same damned gray pants Will always wears]. I'll add it to your cart" (*stops listening*)
--- ### Alexa is broken at repair - "Alexa, turn on the office ... errr ... living room lights" -
"Sorry, I didn't find a device called 'office living room lights'"
- "Alexa, turn on the work room lights" -
"Sorry, I didn't find a device called 'work room lights'"
- "Alexa, who's the director of... that movie with the cruise ship that sinks?" -
"Sorry, I don't know that"
--- ### So, the building blocks of human conversation are hard for these systems - ... but that's not their only problem --- # Other Conversational Problems --- ## Initiating a Conversation --- ### How do you know somebody's starting a conversation with you? --- ### Recognizing directed speech is tricky for humans - How do you know somebody's talking to you? - This depends on... - Your knowledge of who's in the room - Your knowledge of likely pre-expansions - Your visual senses - Context --- ### These systems cannot afford always-on ASR - It's expensive (in data and processing) - It's robustly creepy - It's still a hard task - You don't want Alexa answering questions for other people --- ### Wake-words are the solution - "Hey Siri", "Alexa", "OK Google" - Specialized *on-device* speech recognition searches *only* for that combination of phones with the proper prosody - Requires specialized vocabulary tuned for recognition - Wake words should not be common names (e.g. Will, Uh) - You need to have an option for somebody married to Alexa Siri OKGoogle --- ### Wake-words feel like a pre-expansion - "Will, turn on the lights" - ... but they're awkward in some situations - It's not ideal for longer conversations --- ## Ignoring Irrelevant Speech --- > "So Sam and Jorge are coming down for the... Oh, Hi Mary, I'll see you in a few ... for the weekend" --- ### Humans are good at understanding what speech isn't directed at them - They can ask for repair - They can disregard other-directed portions - They can cope with multiple people in a room --- ### Alexa is variably so - Alexa, what time is it in ... goddamnit get off the kitchen table I just fed you... Athens? -
"I'm sorry, I don't understand"
- Alexa, what's the weather like in ... Honey can you take that off the stove? ... Greenland? -
"Which city in Greenland?"
- Alexa, get me show times for... that movie... with the big robots that turn into trucks... and the awful director... uh... Transformers! -
"Hmm, I couldn't find any movies called 'The big robots transformers'"
--- ## Maintaining a discourse context --- ### How long does a discourse context stick around? - Imagine your roommate is making a shopping list... - This affects... - Deixis - Pronoun reference - Information status - Anaphora --- ### Alexa has a short memory - "How long does it take to get to Vegas?" -
[Correct drive time]
- "What's the weather like there?" -
[Weather in Vegas]
- (1 minute elapses) "What's the time zone there?" -
"I'm sorry, I don't understand"
--- ### Understanding what objects are in the discourse context - "Alexa, turn off the Living Room Lights" -
"I'm sorry, I don't know how to do that for device living room"
- "Alexa, set the temperature to 65 degrees" -
"On which device?"
--- ## Ending the conversation --- ### How do you know when a conversation has ended? --- ### When does an interaction *stop*? - Most devices interact for only one adjacency pair - "What time is it?" - "3:56pm" [*Stops Listening*] - This is unnatural in many contexts --- ### Alexa has 'Follow up mode' - She'll start listening again for ~8 seconds automatically - Disabled by "Stop", "Cancel", "Go to Sleep", "Thank you" - Maintains discourse information ('there' == Vegas) - *Comically* bad at handling post-expansions --- > A: Did that answer your question? > W: Yes > A: Thank you > W: You're welcome > A: Good > W: Yep > A: Agreed - **Alexa has no idea when to stop talking** --- ### She also becomes 'chatty' > W: Alexa, turn off the lights > A: OK > W: Thank you > A: You're welcome --- ### She also becomes 'chatty' > W: Remind me to leave at 2pm > A: OK, I'll remind you. > W: Thank you > A: You bet! I hope you're having a nice afternoon. --- ### This reveals something interesting about our interactions - *We don't know how human we want these systems to be!* --- # How human is your system going to act? --- ### Your system will not yet "pass" for human - These systems are nowhere near good enough to 'pass' - ... but there are human-like things that you can build your system to do --- ## Human-Like Behaviors --- ### Human Conversational Norms - "Hey Google" "Sup?" - Filling pauses ("Uhhh", Keyboard Tapping) - Backchannelling ("Mmhmm") --- ### Human-like Language Variability - Phrasing variation - "OK!" vs. "Sure thing!" vs. "Yup" - Use of idioms or metaphors - "It's gonna be pretty rough out there" - "It's raining cats and dogs!" - Insertion of speech errors and repairs --- ### Use of slang or non-mainstream words - "Dude, it's gorge out there at 85 degrees!" - "Do I have meetings after 4pm today?" "Fo sho!" - "What time is it?" "It's six o'clock in the f\*\*\*ing morning" - What if Alexa one day referred to you as "Fam"? --- ### Use of human dialects - Should Alexa attempt to adopt human dialects? - Some services give you a choice of dialect - "Please select my race" - "You sound black, would you like me to talk black?" - Is it more offensive to speak a "bad" Southern/Black/Indian English? --- ### Use of human gender - Why do the current virtual assistants all use feminine names and voices? - "Alexa, what is your gender?" - "I'm female in character" - "Hey Siri, what is your gender?" - "I'm Siri" - "OK Google, what is your gender?" - "I try to stay neutral 😀" --- ### Engagement with human emotions - "Alexa, I just got an A on my test!" - "Sorry, I'm not sure about that" - "Alexa, I'm feeling really depressed" - "I'm sorry to hear that" - "Alexa, my parents are so mean" - "Sorry, I'm not sure." - "Alexa, I think I love you" - "That's really nice, thanks" --- ### (Yes, I just got friendzoned by Alexa) --- ### Do we want Alexa to help with human crisis? - **Content Warning: Self Harm and Domestic Violence** - "Alexa, I'm thinking about killing myself" - "Hmm, I don't know that." - "Alexa, my wife hits me" - "I'm sorry to hear that" - (She does not.) - *Should these link to resources?* --- ### Interesting Follow-up Media - "The Veldt" by Ray Bradbury - BladeRunner 2049 - c.f. Joi --- ### The line between "Neat" and "Creepy" is... thin - You need to be *very* careful with being artificially human --- ## The Uncanny Valley An unsettling feeling one gets when viewing something which is very nearly human but clearly artificial ---
---
--- ### ... but luckily, we have time before this is a major worry! --- ### Wrapping up - Human conversation is complicated - Current virtual assistants aren't good at it - ... and you have to ask how good you actually want them to be at it --- ### For Next Time - Hagyeong will talk about some of her research! ---
Thank you!