Artículo

Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech. © 2010 Elsevier Ltd. All rights reserved.

Registro:

Documento: Artículo
Título:Turn-taking cues in task-oriented dialogue
Autor:Gravano, A.; Hirschberg, J.
Filiación:Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina
Laboratorio de Investigaciones Sensoriales, Hospital de Clínicas, Universidad de Buenos Aires, Argentina
Department of Computer Science, Columbia University, New York, NY, United States
Palabras clave:Dialogue; IVR systems; Prosody; Turn-taking; Back channels; Columbia; Dialogue; Interactive voice response; Interactive voice response systems; IVR systems; Prosody; System usability; Turn-taking; Speech recognition
Año:2011
Volumen:25
Número:3
Página de inicio:601
Página de fin:634
DOI: http://dx.doi.org/10.1016/j.csl.2010.10.003
Título revista:Computer Speech and Language
Título revista abreviado:Comput Speech Lang
ISSN:08852308
CODEN:CSPLE
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_08852308_v25_n3_p601_Gravano

Referencias:

  • Abney, S., Partial parsing via finite-state cascades (1996) Journal of Natural Language Engineering, 2 (4), pp. 337-344
  • Atterer, M., Baumann, T., Schlangen, D., Towards incremental end-of-utterance detection in dialogue systems (2008) Coling, pp. 11-14. , Manchester, UK
  • Beattie, G.W., The regulation of speaker turns in face-to-face conversation; Some implications for conversation in soundonly communication channels (1981) Semiotica, 34, pp. 55-70
  • Beattie, G.W., Turn-taking and interruption in political interviews: Margaret Thatcher and Jim Callaghan compared and contrasted (1982) Semiotica, 39, pp. 93-114
  • Beckman, M.E., Hirschberg, J., (1994) The ToBI Annotation Conventions, , Ohio State Univ
  • Bhuta, T., Patrick, L., Garnett, J.D., Perceptual evaluation of voice quality and its correlation with acoustic measurements (2004) Journal of Voice, 18 (3), pp. 299-304. , DOI 10.1016/j.jvoice.2003.12.004, PII S0892199703001735
  • Boersma, P., Weenink, D., (2001) Praat: Doing Phonetics by Computer, , http://www.praat.org
  • Bull, M., Aylett, M., An analysis of the timing of turn-taking in a corpus of goal-oriented dialogue (1998) ICSLP
  • Cathcart, N., Carletta, J., Klein, E., A shallow model of backchannel continuers in spoken dialogue (2003) EACL, pp. 51-58
  • Charniak, E., Johnson, M., Edit detection and parsing for transcribed speech (2001) Proceedings of NAACL
  • Collins, M., Head-driven statistical models for natural language parsing (2003) Computational Linguistics, 29 (4), pp. 589-637. , DOI 10.1162/089120103322753356
  • Cohen, J., A coefficient of agreement for nominal scales (1960) Educational and Psychological Measurement, 20, pp. 37-46
  • Cohen, W.C., Fast effective rule induction (1995) Proceedings of the Twelfth International Conference on Machine Learning
  • Cortes, C., Vapnik, V., Support vector networks (1995) Machine Learning, pp. 273-297
  • Cutler, E.A., Pearson, M., On the analysis of prosodic turn-taking cues (1986) Intonation in Discourse, pp. 139-156
  • Duncan, S., Some signals and rules for taking speaking turns in conversations (1972) Journal of Personality and Social Psychology, 23, pp. 283-292
  • Duncan, S., Toward a grammar for dyadic conversation (1973) Semiotica, 9, pp. 29-46
  • Duncan, S., On the structure of speaker-auditor interaction during speaking turns (1974) Language in Society, 3, pp. 161-180
  • Duncan, S., Interaction units during speaking turns in dyadic, face-to-face conversations (1975) Organization of Behavior in Face-to-Face Interaction, , Mouton Publishers Den Hague
  • Duncan, S., Fiske, D., (1977) Face-To-Face Interaction: Research, Methods, and Theory, , Lawrence Erlbaum Associates
  • Du Bois, J., Schuetze-Coburn, S., Cumming, S., Paolino, D., Outline of discourse transcription (1993) Talking Data: Transcription and Coding in Discourse Research
  • Edlund, J., Heldner, M., Gustafson, J., Utterance segmentation and turn-taking in spoken dialogue systems (2005) Sprachtechnologie Mobile Kommunikation und Linguistische Ressourcen, pp. 576-587
  • Eskenazi, L., Childers, D.G., Hicks, D.M., Acoustic correlates of vocal quality (1990) Journal of Speech and Hearing Research, 33 (2), pp. 298-306
  • Ferguson, N., Simultaneous speech, interruptions and dominance (1977) British Journal of Social and Clinical Psychology, 16 (4), pp. 295-302
  • Ferrer, L., Shriberg, E., Stolcke, A., A prosody-based approach to end-of-utterance detection that does not require speech recognition (2003) Proceedings of ICASSP
  • Ferrer, L., Shriberg, E., Stolcke, A., Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody (2002) Proceedings of the ICSLP, pp. 2061-2064
  • Ford, C., Thompson, S., Interactional units in conversation: Syntactic intonational and pragmatic resources for the management of turns (1996) Interaction and Grammar, pp. 134-184
  • Fry, D., Simple reaction-times to speech and non-speech stimuli (1975) Cortex, 11, pp. 355-360
  • Godfrey, J., Holliman, E., McDaniel, J., Switchboard: Telephone speech corpus for research and development (1992) IEEE International Conference on Acoustics, Speech, and Signal Processing
  • Goodwin, C., (1981) Conversational Organization: Interaction between Speakers and Hearers, , Academic Press
  • Gravano, A., Benus, S., Hirschberg, J., Mitchell, S., Vovsha, I., Classification of discourse functions of affirmative words in spoken dialogue (2007) Proceedings of Interspeech
  • Heckerman, D., Geiger, D., Chickering, D., Learning Bayesian networks: The combination of knowledge and statistical data (1995) Machine Learning, 20, pp. 197-243
  • Hemphill, C., Godfrey, J., Doddington, G., The ATIS spoken language systems pilot corpus (1990) Proceedings of the Workshop on Speech and Natural Language, pp. 96-101
  • Hjalmarsson, A., On cue - Additive effects of turn-regulating phenomena in dialogue (2009) Diaholmia
  • Jefferson, G., Notes on a systematic deployment of the acknowledgement tokens "yeah"; And "mm hm" (1984) Research on Language & Social Interaction, 17, pp. 197-216
  • Jensen, F., (1996) Introduction to Bayesian Networks, , Springer-Verlag New York
  • Jurafsky, D., Shriberg, E., Fox, B., Curl, T., Lexical, prosodic and syntactic cues for dialog acts (1998) Proceedings of ACL/COLING, Workshop on Discourse Relations and Discourse Markers, pp. 114-120
  • Kendon, A., Some functions of gaze-direction in social interaction (1967) Acta Psychologica, 26, pp. 22-63
  • Kendon, A., Some relationships between body motion and speech (1972) Studies in Dyadic Communication, pp. 177-210
  • Kitch, J.A., Oates, J., Greenwood, K., Performance effects on the voices of 10 choral tenors: Acoustic and perceptual findings (1996) Journal of Voice, 10 (2-3), pp. 217-227
  • Koehn, P., Abney, S., Hirschberg, J., Collins, M., Improving intonational phrasing with syntactic information (2000) Proceedings of ICASSP, Vol. 3, pp. 1289-1290
  • Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y., An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs (1998) Language and Speech, 41 (3-4), pp. 295-321
  • Lafferty, J., McCallum, A., Pereira, F., Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001) 18th International Conference on Machine Learning, pp. 282-289. , Morgan Kaufmann San Francisco, CA
  • Marcus, M., Marcinkiewicz, M., Santorini, B., Building a large annotated corpus of English: The Penn Treebank (1993) Computational Linguistics, 19, pp. 313-330
  • McNeill, D., (1992) Hand and Mind: What Gestures Reveal about Thought, , University of Chicago Press
  • Mushin, I., Stirling, L., Fletcher, J., Wales, R., Discourse structure, grounding, and prosody in task-oriented dialogue (2003) Discourse Processes, 35, pp. 1-31
  • Novick, D., Sutton, S., An empirical model of acknowledgment for spoken-language systems (1994) Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 96-101. , Morristown, NJ, USA
  • Ogden, R., Creaky voice and turn-taking in Finnish (2002) Colloquium of the British Association of Audiological Physicians
  • Pierrehumbert, J., (1980) The Phonology and Phonetics of English Intonation, , Ph.D. Thesis. Massachusetts Institute of Technology
  • Pierrehumbert, J., Hirschberg, J., The meaning of intonational contours in the interpretation of discourse (1990) Intentions in Communication, pp. 271-311
  • Pitrelli, J.F., Beckman, M.E., Hirschberg, J., Evaluation of prosodic transcription labeling reliability in the ToBI framework (1994) Proceedings of ICSLP, pp. 123-126
  • Quinlan, J.R., (1993) C4.5: Programs for Machine Learning, , Morgan Kaufmann, 1993
  • Rabiner, L., A tutorial on Hidden Markov Models and selected applications in speech recognition (1989) Proceedings of the IEEE 77, pp. 257-286
  • Ratnaparkhi, A., Brill, E., Church, K., A maximum entropy model for part-of-speech tagging (1996) Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 133-142
  • Raux, A., Bohus, D., Langner, B., Black, A.W., Eskenazi, M., Doing research on a deployed spoken dialogue system: One year of Let's Go! experience (2006) Proceedings of Interspeech
  • Raux, A., Eskenazi, M., Optimizing endpointing thresholds using dialogue features in a spoken dialogue system (2008) SIGdial, , Columbus, OH
  • Schegloff, E., Discourse as an interactional achievement: Some uses of uh huhand other things that come between sentences (1982) Analyzing Discourse: Text and Talk
  • Sacks, H., Schegloff, E.A., Jefferson, G., A simplest systematics for the organization of turn-taking for conversation (1974) Language, 50, pp. 696-735
  • Schaffer, D., The role of intonation as a cue to turn taking in conversation (1983) Journal of Phonetics, 11, pp. 243-257
  • Schlangen, D., From reaction to prediction: Experiments with computational models of turn-taking (2006) Proceedings of Interspeech
  • Shriberg, E., Stolcke, A., Jurafsky, D., Coccaro, N., Meteer, M., Bates, R., Taylor, P., Van Ess-Dykema, C., Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? (1998) Language and Speech, 41 (3-4), pp. 443-492
  • Shriberg, E., Stolcke, A., Baron, D., Observations on overlap: Findings and implications for automatic processing of multi-party conversation (2001) Eurospeech, pp. 1359-1362
  • Stolcke, A., Ries, K., Coccaro, N., Shriberg, E., Bates, R., Jurafsky, D., Taylor, P., Meteer, M., Dialogue act modeling for automatic tagging and recognition of conversational speech (2000) Computational Linguistics, 26, pp. 339-373
  • Ten Bosch, L., Oostdijk, N., Boves, L., On temporal aspects of turn taking in conversational dialogues (2005) Speech Communication, 47 (1-2), pp. 80-86. , DOI 10.1016/j.specom.2005.05.009, PII S0167639305001330
  • Vapnik, V.N., (1995) The Nature of Statistical Learning Theory, , Springer-Verlag New York
  • Ward, N., Tsukahara, W., Prosodic features which cue back-channel responses in English and Japanese (2000) Journal of Pragmatics, 32, pp. 1177-1207
  • Ward, N., Rivera, A., Ward, K., Novick, D., Root causes of lost time and user stress in a simple dialog system (2005) Interspeech
  • Wennerstrom, A., Siegel, A.F., Keeping the floor in multi-party conversations: Intonation, syntax, and pause (2003) Discourse Processes, 36, pp. 77-107
  • Wichmann, A., Caspers, J., Melodic cues to turn-taking in English: Evidence from perception (2001) Proceedings of the Second SIGdial Workshop on Discourse and Dialogue
  • Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M., Price, P., Segmental durations in the vicinity of prosodic phrase boundaries (1992) The Journal of the Acoustical Society of America, 91, pp. 1707-1717
  • Witten, I., Frank, E., (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, , Morgan Kaufmann
  • Yngve, V., On getting a word in edgewise (1970) Proceedings of the Sixth Regional Meeting of the Chicago Linguistic Society, Vol. 6, pp. 657-677
  • Yuan, J., Liberman, M., Cieri, C., Towards an integrated understanding of speech overlaps in conversation (2007) ICPhS XVI, Saarbrücken, Germany

Citas:

---------- APA ----------
Gravano, A. & Hirschberg, J. (2011) . Turn-taking cues in task-oriented dialogue. Computer Speech and Language, 25(3), 601-634.
http://dx.doi.org/10.1016/j.csl.2010.10.003
---------- CHICAGO ----------
Gravano, A., Hirschberg, J. "Turn-taking cues in task-oriented dialogue" . Computer Speech and Language 25, no. 3 (2011) : 601-634.
http://dx.doi.org/10.1016/j.csl.2010.10.003
---------- MLA ----------
Gravano, A., Hirschberg, J. "Turn-taking cues in task-oriented dialogue" . Computer Speech and Language, vol. 25, no. 3, 2011, pp. 601-634.
http://dx.doi.org/10.1016/j.csl.2010.10.003
---------- VANCOUVER ----------
Gravano, A., Hirschberg, J. Turn-taking cues in task-oriented dialogue. Comput Speech Lang. 2011;25(3):601-634.
http://dx.doi.org/10.1016/j.csl.2010.10.003