ICT Workshop on Overlap in Human-Computer Dialogue and Semdial 2011
Thursday, September 29th, 2011Sorry, this title is a mouthful. I spent the week from October 19th-23rd 2011 at the offices of the good folks at ICT in Los Angeles who organized the aforementioned workshop co-located with Semdial 2011 “Los Angelogue”, held there that same week. Many thanks to all the wonderful, busy people who made it happen.
The workshop’s main theme, “overlap”, describes a set of less-studied sets of behavior in human-human (and human-machine) dialogue. While there’s general agreement that humans do not speak at the same time very often, interesting things do happen when they do. These, the workshop tried to describe, classify and tie into more abstract dialogue-related themes. Notably, back-channels, interruptions (competitive and cooperative), side-talk and turn-taking events were a recurring theme.
The workshop consisted of a number of breakout sessions, in which focused on specific aspects of these issues. Our 2nd day group, for instance, tried to account for turn-taking behavior, i.e. how dialogue participants make decisions about when to speak next, in terms of avoiding or initiating overlap. As an example, we used the following “pre-linguistic” behavior:
I’m not sure we solved the problem even for a set of dialogue acts consisting of DADA, STOMP and LAUGH, but there was generally consensus that somehow the desire to speak next (i.e. having something to say) and the utility to start talking should somehow be maximized at any given moment. Whether the two should come from a joint utility function was left open.
A recurring question was of what use of low-level “overlap” phenomena – especially back-channels, like “yeah”, “ok”, “mhm”, “uh” indicating (non-)understanding – are in modeling higher-level aspects of discourse, such as grounding. “Of little use” may be supported by such observations that speakers do not generally consciously remember producing such short utterances, indicating their more or less reflexive nature. “Of much use”, in turn, is the observation that such phenomena often occur at specific and (semantically/informationally) relevant points in conversation, indicating that they are useful for grounding information.
Following the workshop, Semdial offered a broader view of Semantics in Dialogue, from psycho- and developmental linguistics, formal semantics and human-machine dialogue research. Keynote speakers Patrick Healey, Jerry Hobbs, David Schlangen and Lenhart Schubert exemplified this range. Jerry Hobbs’ talk in particular contained a number of observations, arrived at by analysis of a three-party meeting-schedule dialogue, that left me with ideas to try out in my own dialogue system implementation.
Firstly, his observation is that such dialogues largely follow an ordered, task-specific breakdown of steps towards a high-level goal (finding a suitable scheduling arrangement). “Violations” of this order occur when relevant partial tasks were not recently ratified or were important and thus merited revisitation. (“Violation” is his term, though it seems to indicate that the order is somehow prescribed, though it is not clear by whom/what. “Revisitation” is more apt I believe.) This finding I take as a good indication that my own dialogue management approach in terms of hierarchical discourse units (that encode “order” in a similar fashion) is on the right track (see my own Semdial contribution for details.)
Secondly, a more low-level analysis of the types of questions posed during the scheduling session indicates that, even though a large number of questions are either posed as explicit and implicit YesNo questions, or otherwise answerable by “yes” and “no”, a very low proportion of them is ever treated as such. Here I see a strong divergence between human-human and human-machine dialogues, as the latter explicitly makes use of YesNo questions to overcome technological shortcomings, e.g. pose them to avoid errors from speech recognition in order to produce more robust dialogues. Such “computer-speak” (“I think you said Boston. Is that right?”), of course, is perceived as unnatural by human users/subjects. Perhaps, instead, posing the more common type of YesNo question, i.e. one that is aimed at collecting actual information (“Do you know how you want to travel?”), in a spoken dialogue system can rehabilitate (or “even out”) the more explicit kind used in systems today. To be determined.
