Spoken language recognition based on senone posteriors

Ferrer, L.; Lei, Y.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Ferrer, L.; Lei, Y.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat "Spoken language recognition based on senone posteriors" (2014) 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014:2150-2154

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte la política de Acceso Abierto del editor

Abstract:

This paper explores in depth a recently proposed approach to spoken language recognition based on the estimated posteriors for a set of senones representing the phonetic space of one or more languages. A neural network (NN) is trained to estimate the posterior probabilities for the senones at a frame level. A feature vector is then derived for every sample using these posteriors. The effect of the language used in training the NN and the number of senones are studied. Speech-activity detection (SAD) and dimensionality reduction approaches are also explored and Gaussian and NN backends are compared. Results are presented on heavily degraded speech data. The proposed system is shown to give over 40% relative gain compared to a state-of-the-art language recognition system at sample durations from 3 to 120 seconds. Copyright © 2014 ISCA.

Registro:

Documento:	Conferencia
Título:	Spoken language recognition based on senone posteriors
Autor:	Ferrer, L.; Lei, Y.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Filiación:	Speech Technology and Research Laboratory, SRI InternationalCA, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:	Speech communication; Activity detection; Dimensionality reduction; Feature vectors; Language recognition; Neural network (nn); Posterior probability; Speech data; Spoken language recognition; Speech recognition
Año:	2014
Página de inicio:	2150
Página de fin:	2154
Título revista:	15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer

Referencias:

Haizhou, L., Bin, M., Kong, A.L., Spoken language recognition: From fundamentals to practice (2013) Proceedings of the IEEE
Martinez, D.G., Plchot, O., Burget, L., Glembek, O., Matejka, P., Language recognition in ivectors space (2013) Proc. Inter Speech, , Lyon, France, Aug
Matejka, P., Schwarz, P., Cernocky, J., Chytil, P., Phonotactic language identification using high quality phoneme recognition (2005) Interspeech-2005
Shen, W., Campbell, W., Gleason, T., Reynolds, D., Singer, E., Experiments with lattice-based pprlm language identification (2006) Odyssey 2006 -The Speaker and Language Recognition Workshop, pp. 1-6
Stolcke, A., Akbacak, M., Ferrer, L., Kajarekar, S., Richey, C., Scheffer, N., Shriberg, E., Improving language recognition with multilingual phone recognition and speaker adaptation transforms (2010) Proc. Odyssey-10, , Brno, Czech Republic, June
D'Haro, L.F., Glembek, O., Plchot, O., Matejka, P., Soufifar, M., Cordoba, R., Cernocky, J., Phonotactic language recognition using i-vectors and phoneme posteriogram counts (2012) Interspeech-2012, pp. 42-45
Diez, M., Varona, A., Penagarikano, M., Fuentes, L.J.R.-, Bordel, G., On the use of log-likelihood ratios as features in spoken language recognition (2012) IEEE Workshop on Spoken Language Technology (SLT 2012), , Miami, Florida, USA
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June
Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97
Dahl, G.E., Yu, D., Deng, L., Acero, A., Context dependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42
Lecun, Y., Bengio, Y., (1995) Convolutional Networks for Images, Speech, and Time-series, pp. 255-258. , MIT Press
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., Gradient based learning applied to document recognition (1998) Proceedings of the IEEE, pp. 2278-2324
Abdel-Hamid, O., Mohamed, A., Jiangy, H., Penn, G., Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition (2012) ICASSP-2012, pp. 4277-4280
Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B., Deep convolutional neural networks for lvcsr (2013) ICASSP-2013, pp. 8614-8618
Abdel-Hamid, O., Deng, L., Yu, D., Exploring convolutional neural network structures and optimization techniques for speech recognition (2013) Interspeech-2013, pp. 3366-3370
Scheffer, N., Lei, Y., Ferrer, L., Factor analysis back ends for mllr transforms in speaker recognition (2013) Proc. Inter Speech, , Lyon, France, Aug
Van Leeuwen, D.A., Brummer, N., Channel dependent gmm and multi-class logistic regression models for language recognition (2006) Proc. Odyssey-06, , Puerto Rico, USA, June
Brummer, N., Van Leeuwen, D.A., On calibration of language recognition scores (2006) Proc. Odyssey-06, , Puerto Rico, USA, June
Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., Graciarena, M., Improving language identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. Inter Speech, , Lyon, France, Aug
Walker, K., Strassel, S., The rats radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
DARPA RATS Program, , http://www.darpa.mil/OurWork/I2O/Programs/RobustAutomaticTranscriptionofSpeech(RATS).aspx
McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of dcts for contextualizing features for speaker recognition (2014) Proc. ICASSP, , Florence, May
McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., Lei, Y., Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion (2013) Proc. ICASSP, , Vancouver, May
Kim, C., Stern, R.M., Power-normalized cepstral coefficients (pncc) for robust speech recognition (2012) Proc. ICASSP, , Kyoto, Mar
NIST LRE09 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09EvalPlanv6.pdfA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat

Citas:

---------- APA ----------

Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., Meng H.,..., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat (2014) . Spoken language recognition based on senone posteriors. 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]

---------- CHICAGO ----------

Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "Spoken language recognition based on senone posteriors" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 (2014) : 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]

---------- MLA ----------

Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "Spoken language recognition based on senone posteriors" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2014, pp. 2150-2154.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]

---------- VANCOUVER ----------

Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. Spoken language recognition based on senone posteriors. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2014:2150-2154.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p2150_Ferrer [ ]