Review of spoken dialogue systems

Ramón López-Cózar; Zoraida Callejas; David Griol; José F. Quesada

doi:10.3989/loquens.2014.012

Authors

Ramón López-Cózar Universidad de Granada
Zoraida Callejas Universidad de Granada
David Griol Universidad Carlos III de Madrid
José F. Quesada Universidad de Sevilla

DOI:

https://doi.org/10.3989/loquens.2014.012

Keywords:

dialogue, language understanding, dialogue management, natural language generation, speech synthesis

Abstract

Spoken dialogue systems are computer programs developed to interact with users employing speech in order to provide them with specific automated services. The interaction is carried out by means of dialogue turns, which in many studies available in the literature, researchers aim to make as similar as possible to those between humans in terms of naturalness, intelligence and affective content. In this paper we describe the fundaments of these systems including the main technologies employed for their development. We also present an evolution of this technology and discuss some current applications. Moreover, we discuss development paradigms, including scripting languages and the development of conversational interfaces for mobile apps. The correct modelling of the user is a key aspect of this technology. This is why we also describe affective, personality and contextual models. Finally, we address some current research trends in terms of verbal communication, multimodal interaction and dialogue management.

Downloads

Download data is not yet available.

References

Acosta, J. C., & Ward, N. G. (2011). Achieving rapport with turnby-turn, user-responsive emotional coloring. Speech Communication, 53(9–10), 1137–1148. http://dx.doi.org/10.1016/j.specom.2010.11.006

Ahmad, F., Hogg-Johnson, S., Stewart, D. E., Skinner, H. A., Glazier, R. H., & Levinson, W. (2009). Computer-assisted screening for intimate partner violence and control: A randomized trial. Annals of Internal Medicine, 151(2), 93–102. http://dx.doi.org/10.7326/0003-4819-151-2-200907210-00124 PMid:19487706

Alexandersson, J., Girenko, A., Spiliotopoulos, D., Petukhova, V., Klakow, D., Koryzis, D., … & Gardner, M. (2014). Metalogue: A multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management. Proceedings of 10th International Conference on Intelligent Environments (IE '14), 365–368. http://dx.doi.org/10.1109/ie.2014.67

Allen, J. (1995). Natural language understanding. Redwood City, CA: The Benjamin Cummings.

Andrade, A. O., Pereira, A. A., Walter, S., Almeida, R., Loureiro, R., Compagna, D., & Kyberd, P. J. (2014). Bridging the gap between robotic technology and health care. Biomedical Signal Processing and Control, 10, 65–78. http://dx.doi.org/10.1016/j.bspc.2013.12.009

Andreani, G., Di Fabbrizio, D., Gilbert, M., Gillick, D., Hakkani-Tur, D., & Lemon, O. (2006). Let's DISCOH: Collecting an annotated open corpus with dialogue acts and reward signals for natural language helpdesks. IEEE 2006 Workshop on Spoken Language Technology, 218–221. http://dx.doi.org/10.1109/SLT.2006.326794

Baker, R. S. J. d., D'Mello, S. K., Rodrigo, M. M. T., & Graesser, A. C. (2010). Better to be frustrated than bored: The incidence, persistence, and impact of learners' cognitive–affective states during interactions with three different computer-based learning environments. International Journal of Human–Computer Studies, 68(4), 223–241. http://dx.doi.org/10.1016/j.ijhcs.2009.12.003

Balahur, A., Mihalcea, R., & Montoyo, A. (2014). Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications. Computer Speech and Language, 28(1), 1–6. http://dx.doi.org/10.1016/j.csl.2013.09.003

Baptist, L., & Seneff, S. (2000). GENESIS-II: A versatile system for language generation in conversational system applications. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP '00), 3, 271–274.

Batliner, A., Seppi, D. Steidl, S., & Schuller, B. (2010). Segmenting into adequate units for automatic recognition of emotion-related episodes: A speech-based approach. Advances in Human Computer Interaction, 2010. http://dx.doi.org/10.1155/2010/782802

Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G., & Tobiasson, H. (2009). The MonAMI reminder: A spoken dialogue system for face-to-face interaction. Proceedings of the 10th INTERSPEECH Conference 2009, 296–299.

Bickmore, T., & Giorgino, T. (2006). Health dialog systems for patients and consumers. Journal of Biomedical Informatics, 39(5), 556–571. http://dx.doi.org/10.1016/j.jbi.2005.12.004 PMid:16464643

Bickmore, T. W., Puskar, K., Schlenk, E. A., Pfeifer, L. M., & Sereika, S. M. (2010). Maintaining reality: Relational agents for antipsychotic medication adherence. Interacting with Computers, 22(4), 276–288. http://dx.doi.org/10.1016/j.intcom.2010.02.001

Black, L. A., McTear, M. F., Black, N. D., Harper, R., & Lemon, M. (2005). Appraisal of a conversational artefact and its utility in remote patient monitoring. Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, 506–508. http://dx.doi.org/10.1109/CBMS.2005.33

Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken language interface research. Computer Science Department, Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/~max/mainpage_files/bohus%20et%20al%20 olympus_hlt2007.pdf http://dx.doi.org/10.3115/1556328.1556333

Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. Proceedings of the 8th European Conference on Speech Communication and Technology. EUROSPEECH 2003–INTERSPEECH 2003, 597–600.

Bouakaz, S., Vacher, M., Bobillier Chaumon, M.-E., Aman, F., Bekkadja, S., Portet, F., … & Chevalier, T. (2014). CIRDO: Smart companion for helping elderly to live at home for longer. IRBM, 35(2), 100–108. http://dx.doi.org/10.1016/j.irbm.2014.02.011

Boves L., & Os, E. den (2002). Multimodal services–A MUST for UMTS (Tech. Rep.). EURESCOM 2002.

Bui, T. H. (2006). Multimodal dialogue management - State of the art. Human Media Interaction Department, University of Twente (Vol. 2). PMid:16789818 PMCid:PMC1475712

Callejas, Z., Griol, D., Engelbrecht, K.-P., & López-Cózar, R. (2014). A clustering approach to assess real user profiles in spoken dialogue systems. In J. Mariani, S. Rosset, M. Garnier-Rizet & L. Devillers (Eds.), Natural interaction with robots, knowbots and smartphones (pp. 327–334). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-8280-2_29

Callejas, Z., Griol, D., & López-Cózar, R. (2011). Predicting user mental states in spoken dialogue systems. EURASIP Journal on Advances in Signal Processing, 2001, 6. http://dx.doi.org/10.1186/1687-6180-2011-6

Callejas, Z., Griol, D., & López-Cózar, R. (2014). A framework for the assessment of synthetic personalities according to user perception. International Journal of Human–Computer Studies, 72(7), 567–583. http://dx.doi.org/10.1016/j.ijhcs.2014.02.002

Callejas, Z., López-Cózar, D., Ábalos, N., & Griol, D. (2011). Affective conversational agents: The role of personality and emotion in spoken interactions. In D. Pérez-Marín & I. Pascual-Nieto (Eds.), Conversational agents and natural language interaction: Techniques and effective practices (pp. 203–222). IGI Global. http://dx.doi.org/10.4018/978-1-60960-617-6.ch009

Callejas, Z., Ravenet, B., Ochs, M., & Pelachaud, C. (2014). A model to generate adaptive multimodal job interviews with a virtual recruiter. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC '14), 3615–3619.

Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37. http://dx.doi.org/10.1109/T-AFFC.2010.1

Cavazza, M., de la Camara, R. S., & Turunen, M. (2010). How was your day? A Companion ECA. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, 1629–1630.

Cohen, P. (1997). Dialogue modeling. In R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art in human language technology (pp. 204–210). New York: Cambridge University Press PMid:9401498

Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.

Corradini, A., Fredriksson, M., Mehta, M., Königsmann, J., Bernsen, N. O., & Johanneson, L. (2004). Towards believable behavior generation for embodied conversational agents. Proceedings of the Workshop on Interactive Visualisation and Interaction Technologies (IV&IT), 946–953. http://dx.doi.org/10.1007/978-3-540-24688-6_121

Creed, C., & Beale, R. (2012). User interactions with an affective nutritional coach. Interacting with Computers, 24(5), 339–350. http://dx.doi.org/10.1016/j.intcom.2012.05.004

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 30–42. http://dx.doi.org/10.1109/TASL.2011.2134090

Dalianis, H. (1999). Aggregation in natural language generation. Computational Intelligence, 15(4), 384–414. http://dx.doi.org/10.1111/0824-7935.00099

De Silva, L. C., Morikawa, C., & Petra, I. M. (2012). State of the art of smart homes. Engineering Applications of Artificial Intelligence, 25(7), 1313–1321. http://dx.doi.org/10.1016/j.engappai.2012.05.002

Delichatsios, H., Friedman, R. H., Glanz, K., Tennstedt, S., Smigelski, C., Pinto, B., … & Gillman, M. W. (2001). Randomized trial of a "talking computer" to improve adults' eating habits. American Journal of Health Promotion, 15(4), 215–224. http://dx.doi.org/10.4278/0890-1171-15.4.215 PMid:11349340

Dethlefs, N., Hastie, H., Cuayáhuitl, H., & Lemon, O. (2013). Conditional random fields for responsive surface realisation using global features. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 1254–1263.

Dutoit, T. (1996). An introduction to Text-to-Speech synthesis. Dordrecht: Kluwer Academic.

Dybkjær, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication, 43(1–2), 33–54. http://dx.doi.org/10.1016/j.specom.2004.02.001

Expert Advisory Group on Language Engineering Standards (EAGLES) (1996). Evaluation of natural language processing systems (Tech. Rep.). EAGLES Document EAG-EWG-PR2. Center for Sprogteknologi, Copenhagen.

Failenschmid, K., Williams, D., Dybkjær, L., & Bernsen, N. (1999). DISC Deliverable D3.6 (Tech. Rep.). NISLab, University of Southern Denmark. PMid:10602415

Farzanfar, R., Frishkopf, S., Migneault, J., & Friedman, R. (2005). Telephone-linked care for physical activity: A qualitative evaluation of the use patterns of an information technology program for patients. Journal of Biomedical Informatics, 38(3), 220–228. http://dx.doi.org/10.1016/j.jbi.2004.11.011 PMid:15896695

Foster, M. E., Giuliani, M., & Isard, A. (2014). Task-based evaluation of context-sensitive referring expressions in human-robot dialogue. Language, Cognition and Neuroscience, 29(8), 1018– 1034. http://dx.doi.org/10.1080/01690965.2013.855802

Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering Review, 24(4), 375–408. http://dx.doi.org/10.1017/S0269888909990166

Fryer, L., & Carpenter, R. (2006). Bots as language learning tools. Language Learning and Technology, 10(3), 8–14

Geutner, P., Steffens, F., & Manstetten, D. (2002). Design of the VICO spoken dialogue system: Evaluation of user expectations by Wizard-of-Oz experiments. Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC '02), Canary Islands.

Ghanem, K. G., Hutton, H. E., Zenilman, J. M., Zimba, R., & Erbelding, E. J. (2005). Audio computer assisted self interview and face to face interview modes in assessing response bias among STD clinic patients. Sexually Transmitted Infections, 81(5), 421–425. http://dx.doi.org/10.1136/sti.2004.013193 PMid:16199744 PMCid:PMC1745029

Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., … & Zue, V. (1995). Multilingual spoken-language understanding in the MIT Voyager system. Speech Communication, 17(1–2), 1–18. http://dx.doi.org/10.1016/0167-6393(95)00008-C

Graaf, M. M. A. de, & Ben Allouch, S. (2013). Exploring influencing variables for the acceptance of social robots. Robotics and Autonomous Systems, 61(12), 1476–1486. http://dx.doi.org/10.1016/j.robot.2013.07.007

Griol, D., Callejas, Z., López-Cózar, R., & Riccardi, G. (2014). A domain- independent statistical methodology for dialog management in spoken dialog systems. Computer Speech and Language, 28(3), 743–768. http://dx.doi.org/10.1016/j.csl.2013.09.002

Griol, D., Molina, J. M., Sanchis de Miguel, A., & Callejas, Z. (2012). A proposal to create learning environments in virtual worlds integrating advanced educative resources. Journal of Universal Computer Science, 18(18), 2516–2541.

Hardy, H., Biermann, A., Bryce Inouye, R., McKenzie, A., Strzalkowski, T., Ursu, C., … & Wu, M. (2006). The AMITIÉS system: Data-driven techniques for automated dialogue. Speech Communication, 48(3–4), 354–373. http://dx.doi.org/10.1016/j.specom.2005.07.006

Harris, R. A. (2004). Voice interaction design: Crafting the new conversational speech systems. Morgan Kaufmann.

He, Y., & Young, S. (2005). Semantic processing using the Hidden Vector State Model. Computer Speech and Language, 19(1), 85–106. http://dx.doi.org/10.1016/j.csl.2004.03.001

Heinroth, T., & Minker, W. (2013). Introducing spoken dialogue systems into Intelligent Environments. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-5383-3 http://dx.doi.org/10.1007/978-1-4614-5383-3

Hempel, T. (2008). Usability of speech dialogue systems: Listening to the target audience. Springer.

Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A. Jaitly, N., … & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. http://dx.doi.org/10.1109/MSP.2012.2205597

Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.

Hubal, R., & Day, R. S. (2006). Informed consent procedures: An experimental test using a virtual character in a dialog systems training application. Journal of Biomedical Informatics, 39(5), 532–540. http://dx.doi.org/10.1016/j.jbi.2005.12.006 PMid:16464644

Hudlicka, E. (2014). Affective BICA: Challenges and open questions. Biologically Inspired Cognitive Architectures, 7, 98–125. http://dx.doi.org/10.1016/j.bica.2013.11.002

Janarthanam, S., Lemon, O., Liu, X., Bartie, P., Mackaness, W., & Dalmas, T. (2013). A multithreaded conversational interface for pedestrian navigation and question answering. Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 151–153.

Jokinen, K., Kanto, K., & Rissanen, J. (2004). Adaptative user modelling in AthosMail. Lecture Notes on Computer Science, 3196, 149–158.

Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2nd ed.). Prentice Hall.

Kerly, A., Ellis, R., & Bull, S. (2008). CALMsystem: A conversational model for learner modelling. Knowledge-Based Systems, 21(3), 238–246. http://dx.doi.org/10.1016/j.knosys.2007.11.015

Kortum, P. (2008). HCI beyond the GUI: Design for haptic, speech, olfactory, and other nontraditional interfaces. Morgan Kaufmann.

Kovács, G. L., & Kopácsi, S. (2006). Some aspects of Ambient Intelligence. Acta Polytechnica Hungarica, 3(1), 35–60.

Krebber, J. Möller, S., Pegam, R., Jekosch, U., Melichar, M., & Rajman, M. (2004). Wizard-of-Oz tests for a dialog system in smart homes. Paper presented at the Joint Congress CFA/DAGA '04, Strasbourg.

Larsson, S. & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language Engineering, 6(4), 323–340. http://dx.doi.org/10.1017/S1351324900002539

Lebai Lutfi, S., Fernández-Martínez, F., Lucas-Cuesta, J. M., López-Lebón, L., & Montero, J. M. (2013). A satisfactionbased model for affect recognition from conversational features in spoken dialog systems. Speech Communication, 55(7–8), 825–840. http://dx.doi.org/10.1016/j.specom.2013.04.005

Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation. Computer Speech and Language, 25(2), 210–221. http://dx.doi.org/10.1016/j.csl.2010.04.005

Lemon, O., & Pietquin, O. (Eds.) (2012). Data-driven methods for adaptive spoken dialogue systems: Computational learning for conversational interfaces. Springer. http://dx.doi.org/10.1007/978-1-4614-4803-7

Levow, G.-A. (2012). Bridging gaps for spoken dialog system frameworks in instructional settings. Proceedings of NAACL– HLT Workshop on Future Directions and Needs in the Spoken Dialogue Community: Tools and Data, 21–22.

Longé, M., Eyraud, R., & Hullfish, K.C. (2012). Multimodal disambiguation of speech recognition. U.S. Patent No. 8095364 B2. Retrieved from http://www.google.com/patents/US8095364

López, V., Eisman, E. M., Castro, J. L., & Zurita, J. M. (2012). A case based reasoning model for multilingual language generation in dialogues. Expert Systems with Applications, 39(8), 7330–7337. http://dx.doi.org/10.1016/j.eswa.2012.01.085

López-Cózar, & R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems: Development and assessment. John Wiley.

Maglogiannis, I., Zafiropoulos, E., & Anagnostopoulos, I. (2009). An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. Applied Intelligence, 30(1), 24–36. http://dx.doi.org/10.1007/s10489-007-0073-z

Mairesse, F., & Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational Linguistics, 37(3), 455–488. http://dx.doi.org/10.1162/COLI_a_00063

McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169. http://dx.doi.org/10.1145/505282.505285

McTear, M. F. (2004). Spoken dialogue technology. Toward the conversational user interface. Springer. http://dx.doi.org/10.1007/978-0-85729-414-2

McTear, M. F. (2011). Trends, challenges and opportunities in spoken dialogue research. In W. Minker, G. G. Lee, S. Nakamura, & J. Mariani (Eds.), Spoken dialogue systems technology and design (pp. 135–161). New York: Springer. http://dx.doi.org/10.1007/978-1-4419-7934-6_6

McTear, M. F., & Callejas, Z. (2013). Voice application development for Android. Packt.

Melin, H., Sandell, A., & Ihse, M. (2001). CTT-bank: A speech controlled telephone banking system–An initial evaluation. TMHQPSR 42(1), 1–27.

Menezes, P., Lerasle, F., Dias, J., & Germa, T. (2007). Towards an interactive humanoid companion with visual tracking modalities. International Journal of Advanced Robotic Systems, 48–78. http://dx.doi.org/10.5772/4813

Migneault, J. P., Farzanfar, R., Wright, J. A., & Friedman, R. H. (2006). How to write health dialog for a talking computer. Journal of Biomedical Informatics, 39(5), 468–481. http://dx.doi.org/10.1016/j.jbi.2006.02.009 PMid:16564749

Minker, W., Albalate, A., Buhler, D., Pittermann, A., Pittermann, J., Strauss, P.-M., & Zaykovskiy, D. (2006). Recent trends in spoken language dialogue systems. ITI 4th International Conference on Information Communications Technology (ICIT '06), 1–2. http://dx.doi.org/10.1109/itict.2006.358271

Minker, W., Haiber, U., Heisterkamp, P., & Scheible, S. (2004). The SENECA spoken language dialogue system. Speech Communication, 43(1–2), 89–102. http://dx.doi.org/10.1016/j.specom.2004.01.005

Möller, S., Engelbrecht, K.P., & Schleicher, R. (2008). Predicting the quality and usability of spoken dialogue services. Speech Communication, 50(8–9), 730–744. http://dx.doi.org/10.1016/j.specom.2008.03.001

Möller, S., & Heusdens, R. (2013). Objective estimation of speech quality for communication systems. IEEE Transactions on Audio, Speech and Language Processing, 101(9), 1955–1967. http://dx.doi.org/10.1109/jproc.2013.2241374

Moors, A., Ellsworth, P. C., Scherer, K. R., & Frijda, N. H. (2013). Appraisal theories of emotion: State of the art and future development. Emotion Review, 5(2), 119–124. http://dx.doi.org/10.1177/1754073912468165

Nass, C., & Yen, C. (2012). The man who lied to his laptop: What we can learn about ourselves from our machines. Current Trade.

Neustein, A., & Markowitz, J. A. (2013). Mobile speech and advanced natural language solutions (2013 ed.). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-6018-3

O'Neill, I., Hanna, P., Liu, X., Greer, D., & McTear, M. F. (2005). Implementing advanced spoken dialogue management in Java. Science of Computer Programming, 54(1), 99–124. http://dx.doi.org/10.1016/j.scico.2004.05.006

Os, E. den, Boves, L., Lamel, L, & Baggia, P. (1999). Overview of the ARISE project. Proceedings of the 6th European Conference on Speech Communication and Technology, EUROSPEECH 1999, 1527–1530 .

Pfeifer, L. M., & Bickmore, T. (2010). Designing embodied conversational agents to conduct longitudinal health interviews. Proceedings of Intelligent Virtual Agents, 4698–4703.

Picard, R. W. (2003). Affective computing: Challenges. International Journal of Human–Computer Studies, 59(1–2), 55–64. http://dx.doi.org/10.1016/S1071-5819(03)00052-1

Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech. Cambridge, MA: MIT Press.

Pieraccini, R., & Huerta, J. M. (2008). Where do we go from here? In L. Dybkjær & W. Minker (Eds.), Recent trends in discourse and dialogue (pp. 1–24). Springer Netherlands. http://dx.doi.org/10.1007/978-1-4020-6821-8_1

Pon-Barry, H., Schultz, K., Bratt, E.O., Clark, B., & Peters, S. (2006). Responding to student uncertainty in spoken tutorial dialogue systems. International Journal of Artificial Intelligence in Education, 16(2), 171–194.

Qu, C., Brinkman, W.-P., Ling, Y., Wiggers, P., & Heynderickx, I. (2014). Conversations with a virtual human: Synthetic emotions and human responses. Computers in Human Behavior, 34, 58–68. http://dx.doi.org/10.1016/j.chb.2014.01.033

Rabiner, L. R., & Huang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall. PMid:8430825

Ramelson, H. Z., Friedman, R. H., & Ockene, J. K. (1999). An automated telephone-based smoking cessation education and counseling system. Patient Education and Counseling, 36(2), 131–144. http://dx.doi.org/10.1016/S0738-3991(98)00130-X

Rich, C., & Sidner, C. L. (1998). COLLAGEN: A collaboration manager for software interface agents. User Modeling and User- Adapted Interaction, 8(3–4), 315–350. http://dx.doi.org/10.1023/A:1008204020038

Rieser, V., Lemon, O., & Keizer, S. (2014). Natural language generation as incremental planning under uncertainty: Adaptive information presentation for statistical dialogue systems. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(5), 979–994. http://dx.doi.org/10.1109/TASL.2014.2315271

Roda, C., Angehrn, A., & Nabeth, T. (2001). Conversational agents for advanced learning: Applications and research. Proceedings of BotShow 2001, 1–7.

Rodríguez, W. R., Saz, O., & Lleida, E. (2012). A prelingual tool for the education of altered voices. Speech Communication, 54(5), 583–600. http://dx.doi.org/10.1016/j.specom.2011.05.006

Rothkrantz, L. J. M., Wiggers, P., Flippo, F., Woei-A-Jin, D., & van Vark, R. J. (2004). Multimodal dialogue management. Lecture Notes in Computer Science, 3206, 621–628. http://dx.doi.org/10.1007/978-3-540-30120-2_78

Russ, G., Sallans, B., & Hareter, H. (2005). Semantic based information fusion in a multimodal interface. Proceedings of the International Conference on Human–Computer Interaction, HCI '05, Las Vegas. Lawrence Erlbaum.

Saz, O., Yin, S. C., Lleida, E., Rose, R., Vaquero, C., & Rodríguez, W. R. (2009). Tools and technologies for computer-aided speech and language therapy. Speech Communication, 51(10), 948–967. http://dx.doi.org/10.1016/j.specom.2009.04.006

Schlangen, D., & Skantze, G. (2011). A general, abstract model of incremental dialogue processing. Dialogue & Discourse, 2(1), 83–111. http://dx.doi.org/10.5087/dad.2011.105

Schuller, B. W., & Batliner, A. (2013). Computational paralinguistics: Emotion, affect and personality in speech and language processing. John Wiley & Sons. http://dx.doi.org/10.1002/9781118706664

Sekmen, A., & Challa, P. (2013). Assessment of adaptive human– robot interactions. Knowledge-Based Systems, 42, 49–59. http://dx.doi.org/10.1016/j.knosys.2013.01.003

Seneff, S. (2002). Response planning and generation in the MERCURY flight reservation system. Computer Speech and Language, 16(3– 4), 283–312. http://dx.doi.org/10.1016/S0885-2308(02)00011-6

Stewart, J. Q. (1922). An electrical analogue of the vocal organs. Nature, 110, 311–312. http://dx.doi.org/10.1038/110311a0

Turing, A. (1950). Computing machinery and intelligence. Mind, 236, 433–460. http://dx.doi.org/10.1093/mind/LIX.236.433

Vipperla, R., Wolters, M., & Renals, S. (2012). Spoken dialogue interfaces for older people. In K. J. Turner (Ed.), Advances in home care technologies (pp. 118–137). IOS Press.

Walker, M., Hindle, D., Fromer, J., Di Fabbrizio, G., & Mestel, C. (1997). Evaluating competing agent strategies for a voice email agent. Proceedings of the 5th European Conference on Speech Communication and Technology, EUROSPEECH 1997, 2219–2222.

Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language, 12(4), 317–347. http://dx.doi.org/10.1006/csla.1998.0110

Wang, Z., & Lemon, O. (2013). A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability of observed information. Proceedings of 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 423–432.

Weizenbaum, J. (1966). ELIZA–A computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45. http://dx.doi.org/10.1145/365153.365168

Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2011). Some background on dialogue management and conversational speech for dialogue systems. Computer Speech and Language, 25(2), 128–139. http://dx.doi.org/10.1016/j.csl.2010.03.001

Williams, J. D., Yu, K., Chaib-draa, B., Lemon, O., Pieraccini, R., Pietquin, O., … & Young, S. (2012). Introduction to the issue on advances in spoken dialogue systems and mobile interface. IEEE Journal of Selected Topics in Signal Processing, 6(8), 889–890. http://dx.doi.org/10.1109/JSTSP.2012.2234401

Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5), 1160–1179. http://dx.doi.org/10.1109/JPROC.2012.2225812

Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T.S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58. http://dx.doi.org/10.1109/TPAMI.2008.52 PMid:19029545

Zhu, C., Sheng, W. (2011). Motion- and location-based online human daily activity recognition. Pervasive and Mobile Computing, 7(2), 256–269. http://dx.doi.org/10.1016/j.pmcj.2010.11.004

Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPITER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing, 8, 85–96. http://dx.doi.org/10.1109/89.817460