Artículo

Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features. © 2015 Elsevier B.V.

Registro:

Documento: Artículo
Título:Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
Autor:Ferrer, L.; Bratt, H.; Richey, C.; Franco, H.; Abrash, V.; Precoda, K.
Filiación:Speech Technology and Research Laboratory, SRI InternationalCA, United States
Departamento de Computacion, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
Consejo Nacional de Investigaciones Cientficas y Tecnicas (CONICET), Argentina
Palabras clave:Computer-assisted language learning; Gaussian mixture models; Lexical stress detection; Mel frequency cepstral coefficients; Prosodic features; Computational linguistics; Computer aided instruction; Consumer products; E-learning; Feature extraction; Linguistics; Probability; Speech recognition; Computer assisted language learning; Gaussian Mixture Model; Mel frequency cepstral co-efficient; Prosodic features; Stress detection; Learning systems
Año:2015
Volumen:69
Página de inicio:31
Página de fin:45
DOI: http://dx.doi.org/10.1016/j.specom.2015.02.002
Título revista:Speech Communication
Título revista abreviado:Speech Commun
ISSN:01676393
CODEN:SCOMD
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_01676393_v69_n_p31_Ferrer

Referencias:

  • Ananthakrishnan, S., Narayanan, S., An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model (2005) Proc. ICASSP, , Philadelphia
  • Buntine, W., Learning classification trees (1992) Stat. Comput., 2 (2), pp. 63-73
  • Chen, L.-Y., Jang, J.-S., Stress detection of English words for a CAPT system using word-length dependent GMM-based Bayesian classifiers (2012) Interdisc. Inform. Sci., 18 (2), pp. 65-70
  • Chen J. ., Y., Wang, L., Automatic lexical stress detection for Chinese learners' of English (2010) 2010 7th International Symposium on Chinese Spoken Language Processing (ISCSLP)]
  • Deshmukh, O.D., Verma, A., Nucleus-level clustering for word-independent syllable stress classification (2009) Speech Commun., 51 (12)
  • Doddala, H., Deshmukh O. ., D., Verma, A., Role of nucleus based context in word-independent syllable stress classification (2011) Proc. ICASSP, , Prague
  • Duda, R., Hart, P., Stork, D., (2001) Pattern Classification, , Wiley
  • Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interspeech, , Lyon, France
  • Franco, H., Abrash, V., Precoda, K., Bratt, H., Rao, R., Butzberger, J., Rossier, R., Cesari, F., The SRI EduSpeak™ system: Recognition and pronunciation scoring for language learning (2000) Proceedings of InSTILL
  • Franco, H., Bratt, H., Rossier, R., Gadde, V.R., Shriberg, E., Abrash, V., Precoda, K., EduSpeak: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications (2010) Lang. Test., 27 (3), pp. 401-418
  • Lai, M., Chen, Y., Chu, M., Zhao, Y., Hu, F., A hierarchical approach to automatic stress detection in English sentences (2006) Proc. ICASSP, Toulouse
  • Li, C., Liu, J., Xia, S., English sentence stress detection system based on HMM framework (2007) Appl. Math. Comput., 185 (2)
  • Li, K., Qian, X., Kang, S., Meng, H., (2013) Lexical Stress Detection for L2 English Speech Using Deep Belief Networks
  • Lin, C., Wang, H., Language identification using pitch contour information (2005) Proc. ICASSP, 1, pp. 601-604. , Philadelphia
  • Oxman, E., Golshtein, E., Detection of lexical stress using an iterative feature normalization method (2012) Afeka-AVIOS Speech Processing Conference 2012
  • Reynolds, D.A., Quatieri, T.F., Dunn, R.B., Speaker verification using adapted Gaussian mixture models (2000) Digit. Signal Process., 10, pp. 19-41
  • Sluijter, A., Van Heuven, V.J., Spectral balance as an acoustic correlate of linguistic stress (1996) J. Acoust. Soc. Am., 100
  • Talkin, D., (1995) Robust Algorithm for Pitch Tracking, , Elsevier Science
  • Tepperman, J., Narayanan, S., Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners (2005) Proc. ICASSP, , Philadelphia
  • Verma, A., Lal, K.L., Lo, Y.Y., Basak, J., Word independent model for syllable stress evaluation (2006) Proc. ICASSP, , Toulouse
  • Zhao, J., Yuan, H., Liu, J., Xia, S., Automatic lexical stress detection using acoustic features for computer assisted language learning (2011) Proc. APSIPA ASC
  • Zhu, Y., Liu, J., Liu, R., Automatic lexical stress detection for english learning (2003) Proc. 2003 International Conference on Natural Language Processing and Knowledge Engineering, , IEEE

Citas:

---------- APA ----------
Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V. & Precoda, K. (2015) . Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems. Speech Communication, 69, 31-45.
http://dx.doi.org/10.1016/j.specom.2015.02.002
---------- CHICAGO ----------
Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. "Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems" . Speech Communication 69 (2015) : 31-45.
http://dx.doi.org/10.1016/j.specom.2015.02.002
---------- MLA ----------
Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. "Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems" . Speech Communication, vol. 69, 2015, pp. 31-45.
http://dx.doi.org/10.1016/j.specom.2015.02.002
---------- VANCOUVER ----------
Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems. Speech Commun. 2015;69:31-45.
http://dx.doi.org/10.1016/j.specom.2015.02.002