Abstract:
In Computer Assisted Language Learning systems, pronunciation scoring consists in providing a score grading the overall pronunciation quality of the speech uttered by a student. In this work, a log-likelihood ratio obtained with respect to two automatic speech recognition (ASR) models was used as score. One model represents native pronunciation while the other one captures non-native pronunciation. Different approaches to obtain each model and different amounts of training data were analyzed. The best results were obtained training an ASR system using a separate large corpus without pronunciation quality annotations and then adapting it to the native and non-native data, sequentially. Nevertheless, when models are trained directly on the native and non-native data, pronunciation scoring performance is similar. This is a surprising result considering that word error rates for these models are significantly worse, indicating that ASR performance is not a good predictor of pronunciation scoring performance on this system. © Springer International Publishing AG 2017.
Registro:
Documento: |
Artículo
|
Título: | Adaptation approaches for pronunciation scoring with sparse training data |
Autor: | Landini, F.; Ferrer, L.; Franco, H.; Karpov A.; Mporas I.; Potapova R.; ASM Solutions Ltd. |
Filiación: | Departamento de Computación, FCEN, Universidad de Buenos Aires, Buenos Aires, Argentina Instituto de Investigación en Ciencias de la Computación (ICC), UBA-CONICET, Buenos Aires, Argentina Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, United States
|
Palabras clave: | Computer-assisted language learning; Log-likelihood ratio; MAP adaptation; Pronunciation scoring; Computer aided instruction; E-learning; Grading; Linguistics; Automatic speech recognition; Computer assisted language learning; Computer assisted language learning systems; Log likelihood ratio; MAP adaptation; Pronunciation quality; Pronunciation scoring; Scoring performance; Speech recognition |
Año: | 2017
|
Volumen: | 10458 LNAI
|
Página de inicio: | 87
|
Página de fin: | 97
|
DOI: |
http://dx.doi.org/10.1007/978-3-319-66429-3_8 |
Título revista: | 19th International Conference on Speech and Computer, SPECOM 2017
|
Título revista abreviado: | Lect. Notes Comput. Sci.
|
ISSN: | 03029743
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_03029743_v10458LNAI_n_p87_Landini |
Referencias:
- Godfrey, J.J., Holliman, E.C., McDaniel, J., SWITCHBOARD: Telephone speech corpus for research and development (1992) Proceedings of ICASSP. IEEE, San Francisco
- Gauvain, J.-L., Lee, C.-H., Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains (1994) IEEE Trans. Speech Audio Process., 2, pp. 291-298
- Ronen, O., Neumeyer, L., Franco, H., (1997) Automatic Detection of Mispronunciation for Language Instruction, , Proceedings of EUROSPEECH, Rhodes
- Cieri, C., Miller, D., Walker, K., The fisher corpus: A resource for the next generations of speech-to-text (2004) LREC, Lisbon
- Franco, H., Ferrer, L., Bratt, H., Adaptive and discriminative modeling for improved mispronunciation detection (2014) Proceedings of ICASSP. IEEE, Florence
- Robertson, S., Munteanu, C., Penn, G., Pronunciation error detection for new language learners (2016) Proceedings of Interspeech, San Francisco
- Cucchiarini, C., Strik, H., Binnenpoorte, D., Boves, L., (2000) Pronunciation Evaluation in Read and Spontaneous Speech: A Comparison between Human Ratings and Automatic Scores, , Proceedings of the New Sounds. Citeseer
- Hönig, F., Batliner, A., Nöth, E., (2012) Automatic Assessment of Non-Native Prosody Annotation, Modelling and Evaluation, , Proceedings of ISADEPT
- Efron, B., Bootstrap methods: Another look at the Jackknife (1979) Ann. Stat, 7, pp. 1-26
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Vesely, K., (2011) IEEE 2011 Workshop on Automatic Speech Recognition and UnderstandingA4 - ASM Solutions Ltd.
Citas:
---------- APA ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R. & ASM Solutions Ltd.
(2017)
. Adaptation approaches for pronunciation scoring with sparse training data. 19th International Conference on Speech and Computer, SPECOM 2017, 10458 LNAI, 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8---------- CHICAGO ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al.
"Adaptation approaches for pronunciation scoring with sparse training data"
. 19th International Conference on Speech and Computer, SPECOM 2017 10458 LNAI
(2017) : 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8---------- MLA ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al.
"Adaptation approaches for pronunciation scoring with sparse training data"
. 19th International Conference on Speech and Computer, SPECOM 2017, vol. 10458 LNAI, 2017, pp. 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8---------- VANCOUVER ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al. Adaptation approaches for pronunciation scoring with sparse training data. Lect. Notes Comput. Sci. 2017;10458 LNAI:87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8