Conferencia

Lei, Y.; Ferrer, L.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat "A deep neural network speaker verification system targeting microphone speech" (2014) 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014:681-685
Estamos trabajando para incorporar este artículo al repositorio
Consulte la política de Acceso Abierto del editor

Abstract:

We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 NIST SRE. In this new framework, the verification performance greatly depends on the data used for training the DNN. We show that training the DNN using both telephone and microphone speech data can yield significant improvements. An in-depth analysis of the influence of telephone speech data on the microphone conditions is also shown for both the DNN and GMM systems. We conclude by showing that the the GMM system is always outperformed by the DNN system on the telephone-only and microphone-only conditions, and that the new DNN / i-vector framework can be successfully used providing a good match in the training data. Copyright © 2014 ISCA.

Registro:

Documento: Conferencia
Título:A deep neural network speaker verification system targeting microphone speech
Autor:Lei, Y.; Ferrer, L.; McLaren, M.; Scheffer, N.; Chng E.S.; Li H.; Meng H.; Ma B.; Xie L.; Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat
Filiación:Speech Technology and Research Laboratory, SRI InternationalCA, United States
Departamento de Computacíon, FCEN, Universidad de Buenos Aires, Argentina
Palabras clave:Deep neural networks; I-vectors; Microphone data; Speaker recognition; Microphones; Speech; Speech communication; Telephone sets; Accuracy Improvement; Deep neural networks; Gaussian Mixture Model; I-vectors; In-depth analysis; Speaker recognition; Speaker recognition evaluations; Speaker verification system; Speech recognition
Año:2014
Página de inicio:681
Página de fin:685
Título revista:15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei

Referencias:

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2010) IEEE Trans. ASLP, 19, pp. 788-798. , May
  • Reynolds, D.A., Quatieri, T.F., Speaker verification using adapted Gaussian mixture models (2000) Digital Signal Processing, 10, pp. 19-41
  • Prince, S., Probabilistic linear discriminant analysis for inferences about identity (2007) ICCV-2007, pp. 1-8. , IEEE
  • Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012) Signal Processing Magazine, IEEE, 29 (6), pp. 82-97
  • Lei, Y., Scheffer, N., Ferrer, L., Mclaren, M., A novel scheme for speaker recognition using a phoneticallyaware deep neural network (2007) ICASSP-2014, , IEEE
  • Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P., A study of inter-speaker variability in speaker verification (2008) IEEE Trans. ASLP, 16, pp. 980-988. , July
  • Dahl, G., Yu, D., Deng, L., Acero, A., Contextdependent pre-trained deep neural networks for largevocabulary speech recognition (2012) IEEE Trans. ASLP, 20, pp. 30-42
  • Ferrer, L., Mclaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Interspeech- 2013, pp. 1981-1985
  • Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) HLT '94 Proceedings of the Workshop on Human Language Technology, pp. 307-312
  • NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplanv11-r0.pdf
  • Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Janin, A., Magimai-Doss, M., Wooters, C., Zheng, J., The SRIICSI spring 2007 meeting and lecture recognition system (2008) Proc. NIST Rich TranscriptionWorkshop, pp. 450-463. , Springer Lecture Notes in Computer Science
  • Deng, L., Li, J., Huang, J., Yao, K., Yu, D., Seide, F., Seltzer, M., Acero, A., Recent advances in deep learning for speech research at Microsoft (2013) ICASSP-2013, pp. 8604-8608. , IEEEA4 - Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat

Citas:

---------- APA ----------
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., Meng H.,..., Amazon; Baidu; et al.; Google; Temasek Laboratories at Nanyang Technological University (TL at NTU); WeChat (2014) . A deep neural network speaker verification system targeting microphone speech. 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 681-685.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei [ ]
---------- CHICAGO ----------
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "A deep neural network speaker verification system targeting microphone speech" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 (2014) : 681-685.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei [ ]
---------- MLA ----------
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. "A deep neural network speaker verification system targeting microphone speech" . 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, 2014, pp. 681-685.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei [ ]
---------- VANCOUVER ----------
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Chng E.S., Li H., et al. A deep neural network speaker verification system targeting microphone speech. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2014:681-685.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v_n_p681_Lei [ ]