King's speech: pronounce a foreign language with style

Main Article Content

Georgios Athanasopoulos
Céline Lucas
Alessandro Cierro
Robin Guérit
Kaori Hagihara
Julie Chatelain
Sébastien Lugan
Benoît Macq

Abstract

Computer assisted pronunciation training requires strategies that capture the attention of the learners and guide them along the learning pathway. In this paper, we introduce an immersive storytelling scenario for creating appropriate learning conditions. The proposed learning interaction is orchestrated by a spoken karaoke. We motivate the concept of the spoken karaoke and describe our design. Driven by the requirements of the proposed scenario, we suggest a modular architecture designed for immersive learning applications. We present our prototype system and our approach for the processing of spoken and visual interaction modalities. Finally, we discuss how technological challenges can be addressed in order to enable the learner's self-evaluation.

Keywords: Immersive language learning, L2 Pronunciation, Computer assisted pronunciation training, Gamification, Audiovisual speech technology

Downloads

Download data is not yet available.

References

Algazi, V. R., Duda, R. O., Thompson, D. M., & Avendano, C. (2001). The CIPIC HRTF Database. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics. https://doi.org/10.1109/ASPAA.2001.969552

Barrós-Loscertales, A., Ventura-Campos, N., Visser, M., Alsius, A., Pallier, C., Rivera, C. Á., & Soto-Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Journal of Brain and Language

Brognaux, S., & Drugman, T. (2016). HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches. IEEE/ACM Trans. Audio Speech Lang. Process., 24(1). https://doi.org/10.1109/TASLP.2015.2456421

Cugelman, B. (2013). Gamification: what it is and why it matters to digital health behavior change developers. JMIR Serious Games, 1 (1)

Fette, I., & Melnikov, A. (2011). The Websocket Protocol, IETF, RFC 6455.

Hamari, J., Koivisto, J., & Sarsa H. (2014). Does gamification work? A literature review of empirical studies on gamification. Proceedings of 47th Hawaii International Conference on System Sciences (HICSS). https://doi.org/10.1109/HICSS.2014.377

Kalogeras, S. (2013). Media-education Convergence: Applying Transmedia Storytelling Edutainment in E-Learning Environments. International Journal of Information and Communication Technology Education 9(2). https://doi.org/10.4018/jicte.2013040101

Miller, A. S., Cafazzo, J. A., & Seto, E. (2014). A game plan: Gamification design principles in mHealth applications for chronic disease management. Health informatics journal, 22(2), 184-193. https://doi.org/10.1177/1460458214537511

Møller, H. (1992). Fundamentals of Binaural Technology. Applied Acoustics, 36, 171-218. https://doi.org/10.1016/0003-682X(92)90046-U

Müller, M. (2007). Information Retrieval for Music and Motion, chapter Dynamic Time Warping, 69-84, Springer, Berlin, Heidelberg

Soens, P., & Verhelst, W. (2012). On split Dynamic Time Warping for robust Automatic Dialogue Replacement. Signal Processing, 92, 439-454. https://doi.org/10.1016/j.sigpro.2011.08.008

Soens, P., & Verhelst, W. (2012b). An iterative bilinear frequency warping approach to robust speaker-independent time synchronization. Proceedings of 20th European Signal Processing Conference (EUSIPCO)

Stadniczuk, D., Bauckmann, G., & Suendermann-Oeft, D. (2013). An Open-Source Octave Toolbox for VTLN-Based Voice Conversion. Proceedings of International Conference of the German Society for Computational Linguistics and Language Technology Turetsky, R., & Ellis, D. (2003). Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses. Proceedings of 4th International Symposium on Music Information Retrieval (ISMIR)

Verhelst, W., & Roelands, M. (1993). An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.1993.319366

Verhelst, W. (1997). Automatic post-synchronization of speech utterances. Proceedings of 5th European Conference on Speech Communication and Technology

Verhelst, W., & Brouckxon, H. (2003). Rejection phenomena in inter-signal voice transplantations. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. https://doi.org/10.1109/ASPAA.2003.1285857

Zhao, Y. (1997). The Effects of Listener' Control of Speech Rate on Second Language Comprehension. Applied Linguistics, 18(1), 49-68. https://doi.org/10.1093/applin/18.1.49