Conferencia

Ferrer, L.; Lei, Y.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat "Spoken language recognition based on senone posteriors" (2014) 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014:2150-2154
Estamos trabajando para incorporar este artículo al repositorio
Consulte la política de Acceso Abierto del editor

Abstract:

This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a frame level. A feature vector is then derived for every sample using these posteriors. The effect of the language used in training the NN and the number of senones are studied. Speech-activity detection (SAD) and dimensionality reduction approaches are also explored and Gaussian and NN backends are compared. Results are presented on heavily degraded speech data. The proposed system is shown to give over 40% relative gain compared to a state-of-the-art language recognition system at sample durations from 3 to 120 seconds. Copyright © 2014 ISCA.

Registro:

Documento: Conferencia
Título:Spoken language recognition based on senone posteriors
Autor:Ferrer, L.; Lei, Y.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Filiación:Speech Technology and Research Laboratory, SRI InternationalCA, United States
Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:Speech communication; Activity detection; Dimensionality reduction; Feature vectors; Language recognition; Neural network (nn); Posterior probability; Speech data; Spoken language recognition; Speech recognition
Año:2014
Página de inicio:2150
Página de fin:2154
Título revista:15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer

Referencias:

  • Haizhou, L., Bin, M., Kong, A.L., Spoken language recognition: From fundamentals to practice (2013) Proceedings of the IEEE
  • Martinez, D.G., Plchot, O., Burget, L., Glembek, O., Matejka, P., Language recognition in ivectors space (2013) Proc. Inter Speech, , Lyon, France, Aug
  • Matejka, P., Schwarz, P., Cernocky, J., Chytil, P., Phonotactic language identification using high quality phoneme recognition (2005) Interspeech-2005
  • Shen, W., Campbell, W., Gleason, T., Reynolds, D., Singer, E., Experiments with lattice-based pprlm language identification (2006) Odyssey 2006 -The Speaker and Language Recognition Workshop, pp. 1-6
  • Stolcke, A., Akbacak, M., Ferrer, L., Kajarekar, S., Richey, C., Scheffer, N., Shriberg, E., Improving language recognition with multilingual phone recognition and speaker adaptation transforms (2010) Proc. Odyssey-10, , Brno, Czech Republic, June
  • D'Haro, L.F., Glembek, O., Plchot, O., Matejka, P., Soufifar, M., Cordoba, R., Cernocky, J., Phonotactic language recognition using i-vectors and phoneme posteriogram counts (2012) Interspeech-2012, pp. 42-45
  • Diez, M., Varona, A., Penagarikano, M., Fuentes, L.J.R.-, Bordel, G., On the use of log-likelihood ratios as features in spoken language recognition (2012) IEEE Workshop on Spoken Language Technology (SLT 2012), , Miami, Florida, USA
  • Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June
  • Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312
  • Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97
  • Dahl, G.E., Yu, D., Deng, L., Acero, A., Context dependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42
  • Lecun, Y., Bengio, Y., (1995) Convolutional Networks for Images, Speech, and Time-series, pp. 255-258. , MIT Press
  • Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient based learning applied to document recognition (1998) Proceedings of the IEEE, pp. 2278-2324
  • Abdel-Hamid, O., Mohamed, A., Jiangy, H., Penn, G., Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition (2012) ICASSP-2012, pp. 4277-4280
  • Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B., Deep convolutional neural networks for lvcsr (2013) ICASSP-2013, pp. 8614-8618
  • Abdel-Hamid, O., Deng, L., Yu, D., Exploring convolutional neural network structures and optimization techniques for speech recognition (2013) Interspeech-2013, pp. 3366-3370
  • Scheffer, N., Lei, Y., Ferrer, L., Factor analysis back ends for mllr transforms in speaker recognition (2013) Proc. Inter Speech, , Lyon, France, Aug
  • Van Leeuwen, D.A., Brummer, N., Channel dependent gmm and multi-class logistic regression models for language recognition (2006) Proc. Odyssey-06, , Puerto Rico, USA, June
  • Brummer, N., Van Leeuwen, D.A., On calibration of language recognition scores (2006) Proc. Odyssey-06, , Puerto Rico, USA, June
  • Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., Graciarena, M., Improving language identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. Inter Speech, , Lyon, France, Aug
  • Walker, K., Strassel, S., The rats radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
  • DARPA RATS Program, , http://www.darpa.mil/OurWork/I2O/Programs/RobustAutomaticTranscriptionofSpeech(RATS).aspx
  • McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of dcts for contextualizing features for speaker recognition (2014) Proc. ICASSP, , Florence, May
  • McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., Lei, Y., Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. ICASSP, , Vancouver, May
  • Kim, C., Stern, R.M., Power-normalized cepstral coefficients (pncc) for robust speech recognition (2012) Proc. ICASSP, , Kyoto, Mar
  • NIST LRE09 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09EvalPlanv6.pdfA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat

Citas:

---------- APA ----------
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., Meng H.,..., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat (2014) . Spoken language recognition based on senone posteriors. 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]
---------- CHICAGO ----------
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "Spoken language recognition based on senone posteriors" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 (2014) : 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]
---------- MLA ----------
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "Spoken language recognition based on senone posteriors" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2014, pp. 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]
---------- VANCOUVER ----------
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. Spoken language recognition based on senone posteriors. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2014:2150-2154.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]