King's speech: pronounce a foreign language with style
Main Article Content
Abstract
Computer assisted pronunciation training requires strategies that capture the attention of the learners and guide them along the learning pathway. In this paper, we introduce an immersive storytelling scenario for creating appropriate learning conditions. The proposed learning interaction is orchestrated by a spoken karaoke. We motivate the concept of the spoken karaoke and describe our design. Driven by the requirements of the proposed scenario, we suggest a modular architecture designed for immersive learning applications. We present our prototype system and our approach for the processing of spoken and visual interaction modalities. Finally, we discuss how technological challenges can be addressed in order to enable the learner's self-evaluation.
Downloads
References
Algazi, V. R., Duda, R. O., Thompson, D. M., & Avendano, C. (2001). The CIPIC HRTF Database. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics. https://doi.org/10.1109/ASPAA.2001.969552
Barrós-Loscertales, A., Ventura-Campos, N., Visser, M., Alsius, A., Pallier, C., Rivera, C. Á., & Soto-Faraco, S. (2013). Neural correlates of audiovisual speech processing in a second language. Journal of Brain and Language
Brognaux, S., & Drugman, T. (2016). HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches. IEEE/ACM Trans. Audio Speech Lang. Process., 24(1). https://doi.org/10.1109/TASLP.2015.2456421
Cugelman, B. (2013). Gamification: what it is and why it matters to digital health behavior change developers. JMIR Serious Games, 1 (1)
Fette, I., & Melnikov, A. (2011). The Websocket Protocol, IETF, RFC 6455.
Hamari, J., Koivisto, J., & Sarsa H. (2014). Does gamification work? A literature review of empirical studies on gamification. Proceedings of 47th Hawaii International Conference on System Sciences (HICSS). https://doi.org/10.1109/HICSS.2014.377
Kalogeras, S. (2013). Media-education Convergence: Applying Transmedia Storytelling Edutainment in E-Learning Environments. International Journal of Information and Communication Technology Education 9(2). https://doi.org/10.4018/jicte.2013040101
Miller, A. S., Cafazzo, J. A., & Seto, E. (2014). A game plan: Gamification design principles in mHealth applications for chronic disease management. Health informatics journal, 22(2), 184-193. https://doi.org/10.1177/1460458214537511
Møller, H. (1992). Fundamentals of Binaural Technology. Applied Acoustics, 36, 171-218. https://doi.org/10.1016/0003-682X(92)90046-U
Müller, M. (2007). Information Retrieval for Music and Motion, chapter Dynamic Time Warping, 69-84, Springer, Berlin, Heidelberg
Soens, P., & Verhelst, W. (2012). On split Dynamic Time Warping for robust Automatic Dialogue Replacement. Signal Processing, 92, 439-454. https://doi.org/10.1016/j.sigpro.2011.08.008
Soens, P., & Verhelst, W. (2012b). An iterative bilinear frequency warping approach to robust speaker-independent time synchronization. Proceedings of 20th European Signal Processing Conference (EUSIPCO)
Stadniczuk, D., Bauckmann, G., & Suendermann-Oeft, D. (2013). An Open-Source Octave Toolbox for VTLN-Based Voice Conversion. Proceedings of International Conference of the German Society for Computational Linguistics and Language Technology Turetsky, R., & Ellis, D. (2003). Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Syntheses. Proceedings of 4th International Symposium on Music Information Retrieval (ISMIR)
Verhelst, W., & Roelands, M. (1993). An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.1993.319366
Verhelst, W. (1997). Automatic post-synchronization of speech utterances. Proceedings of 5th European Conference on Speech Communication and Technology
Verhelst, W., & Brouckxon, H. (2003). Rejection phenomena in inter-signal voice transplantations. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. https://doi.org/10.1109/ASPAA.2003.1285857
Zhao, Y. (1997). The Effects of Listener' Control of Speech Rate on Second Language Comprehension. Applied Linguistics, 18(1), 49-68. https://doi.org/10.1093/applin/18.1.49