Artículo

Landini, F.; Ferrer, L.; Franco, H.; Karpov A.; Mporas I.; Potapova R.; ASM Solutions Ltd. "Adaptation approaches for pronunciation scoring with sparse training data" (2017) 19th International Conference on Speech and Computer, SPECOM 2017. 10458 LNAI:87-97
El editor solo permite decargar el artículo en su versión post-print desde el repositorio. Por favor, si usted posee dicha versión, enviela a
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

In Computer Assisted Language Learning systems, pronunciation scoring consists in providing a score grading the overall pronunciation quality of the speech uttered by a student. In this work, a log-likelihood ratio obtained with respect to two automatic speech recognition (ASR) models was used as score. One model represents native pronunciation while the other one captures non-native pronunciation. Different approaches to obtain each model and different amounts of training data were analyzed. The best results were obtained training an ASR system using a separate large corpus without pronunciation quality annotations and then adapting it to the native and non-native data, sequentially. Nevertheless, when models are trained directly on the native and non-native data, pronunciation scoring performance is similar. This is a surprising result considering that word error rates for these models are significantly worse, indicating that ASR performance is not a good predictor of pronunciation scoring performance on this system. © Springer International Publishing AG 2017.

Registro:

Documento: Artículo
Título:Adaptation approaches for pronunciation scoring with sparse training data
Autor:Landini, F.; Ferrer, L.; Franco, H.; Karpov A.; Mporas I.; Potapova R.; ASM Solutions Ltd.
Filiación:Departamento de Computación, FCEN, Universidad de Buenos Aires, Buenos Aires, Argentina
Instituto de Investigación en Ciencias de la Computación (ICC), UBA-CONICET, Buenos Aires, Argentina
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, United States
Palabras clave:Computer-assisted language learning; Log-likelihood ratio; MAP adaptation; Pronunciation scoring; Computer aided instruction; E-learning; Grading; Linguistics; Automatic speech recognition; Computer assisted language learning; Computer assisted language learning systems; Log likelihood ratio; MAP adaptation; Pronunciation quality; Pronunciation scoring; Scoring performance; Speech recognition
Año:2017
Volumen:10458 LNAI
Página de inicio:87
Página de fin:97
DOI: http://dx.doi.org/10.1007/978-3-319-66429-3_8
Título revista:19th International Conference on Speech and Computer, SPECOM 2017
Título revista abreviado:Lect. Notes Comput. Sci.
ISSN:03029743
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_03029743_v10458LNAI_n_p87_Landini

Referencias:

  • Godfrey, J.J., Holliman, E.C., McDaniel, J., SWITCHBOARD: Telephone speech corpus for research and development (1992) Proceedings of ICASSP. IEEE, San Francisco
  • Gauvain, J.-L., Lee, C.-H., Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains (1994) IEEE Trans. Speech Audio Process., 2, pp. 291-298
  • Ronen, O., Neumeyer, L., Franco, H., (1997) Automatic Detection of Mispronunciation for Language Instruction, , Proceedings of EUROSPEECH, Rhodes
  • Cieri, C., Miller, D., Walker, K., The fisher corpus: A resource for the next generations of speech-to-text (2004) LREC, Lisbon
  • Franco, H., Ferrer, L., Bratt, H., Adaptive and discriminative modeling for improved mispronunciation detection (2014) Proceedings of ICASSP. IEEE, Florence
  • Robertson, S., Munteanu, C., Penn, G., Pronunciation error detection for new language learners (2016) Proceedings of Interspeech, San Francisco
  • Cucchiarini, C., Strik, H., Binnenpoorte, D., Boves, L., (2000) Pronunciation Evaluation in Read and Spontaneous Speech: A Comparison between Human Ratings and Automatic Scores, , Proceedings of the New Sounds. Citeseer
  • Hönig, F., Batliner, A., Nöth, E., (2012) Automatic Assessment of Non-Native Prosody Annotation, Modelling and Evaluation, , Proceedings of ISADEPT
  • Efron, B., Bootstrap methods: Another look at the Jackknife (1979) Ann. Stat, 7, pp. 1-26
  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Vesely, K., (2011) IEEE 2011 Workshop on Automatic Speech Recognition and UnderstandingA4 - ASM Solutions Ltd.

Citas:

---------- APA ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R. & ASM Solutions Ltd. (2017) . Adaptation approaches for pronunciation scoring with sparse training data. 19th International Conference on Speech and Computer, SPECOM 2017, 10458 LNAI, 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8
---------- CHICAGO ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al. "Adaptation approaches for pronunciation scoring with sparse training data" . 19th International Conference on Speech and Computer, SPECOM 2017 10458 LNAI (2017) : 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8
---------- MLA ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al. "Adaptation approaches for pronunciation scoring with sparse training data" . 19th International Conference on Speech and Computer, SPECOM 2017, vol. 10458 LNAI, 2017, pp. 87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8
---------- VANCOUVER ----------
Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., et al. Adaptation approaches for pronunciation scoring with sparse training data. Lect. Notes Comput. Sci. 2017;10458 LNAI:87-97.
http://dx.doi.org/10.1007/978-3-319-66429-3_8