Research

My research interests all focus, in one way or another, on using computational methods to model and understand the complexity of language and speech, and the amazing process of obtaining usable linguistic information in the face of incredible variation and noise among people, groups, situations, and languages.

Research interests in Phonetics and Speech Perception

As a Phonetician, my primary focus is on this complexity in speech and speech perception, using a strongly computational approach and set of methods.

I’m particularly interested in speech perception and speaker normalization, and computational modeling and measurement of human speech. Much of my work has focused on the subtle acoustical cues, features, and processes that we use to identify subtle distinctions among sounds and words, even in the face of extreme variability and interference.

For instance, my Doctoral Dissertation focused on the acoustical complexity of one particular phenomenon: vowel nasality. My goal was to use statistics, machine learning, signal processing, and human perceptual experiments to better understand vowel nasality in language by discovering not just what parts of the signal change in oral vs. nasal vowels, but which parts of the signal are actually used by listeners to perceive differences in nasality. To read the full abstract or download a copy, go to my publications page.

I’m also very interested in further exploring the use of machine learning to simulate perceptual processes of speech and hearing, alongside conventional articulatory and perceptual methods. These methods can provide information and generate testable hypotheses about measurement, perceptual cues, and the perceptibility of speech in the face of noise and variation. Given that machine learning shows such considerable promise in addressing both phonetic and phonological factors in speech, I intend to continue with this computational approach to phonetics throughout my research career.

At Michigan, during my Post-Doc, I worked with two different research teams. With Pam Beddor and Andries Coetzee, I worked on an NSF grant, exploring the temporal intricacies of speech production and perception on the whole, using eye-tracking, ultrasound, airflow, and complex data analysis to charactize patterns within (and across) individual speakers. I also worked with Jelena Krivokapic to bootstrap and help to run the newly-formed Electromagnetic Articulography (EMA) lab, where we’re using dual Carstens AG501 EMA units to address questions about articulation, prosody, pauses, and gesture with one and two-speaker studies.

Research interests in Natural Language Processing, Computational Linguistics, and Medical Language

Although I’m a speech geek at heart, the field of natural language processing suffers from similarly intriguing problems: Noisy and variable texts contain surprisingly large amounts of information, which is communicated in non-transparent ways.

I’ve worked on a series of joint projects with the Mayo Clinic and Harvard Medical School involving the automated processing of language in medical records and discourse. As I’ve moved further into this field, I’ve become fascinated by the language used in the health care establishment and with the peculiar constructions and vocabulary that have emerged to fit the needs of this specialized speech community, where shared understanding is everywhere, conciseness is financially required, and clarity is legally required.

My main interest is in the temporal domain, working on the THYME Project and its follow-up grant, THYME Phase II. In this project, I developed a schema and a set of guidelines for annotating temporal relations, and supervised a team of annotators tasked with annotating data. I’m delighted to be first author on a paper discussing issues in clinical Temporal annotation, titled Temporal Annotation in the Clinical Domain, available from TACL here.

As part of these projects, I’ve also worked extensively with the UMLS in an effort to create an automated computerized question answering system (under the MiPACQ project), and have worked with causality and coreference as a part of the DEFT project.

Finally, I’m the creator and maintainer of the EnronSent email corpus a cleaned subset of data from the Enron corpus of email, described in University of Colorado Institute of Cognitive Science Technical Report 01-2011.