Advances in deep neural network approaches to speaker recognition

McLaren, M.; Lei, Y.; Ferrer, L.; The Institute of Electrical and Electronics Engineers Signal Processing Society

doi:10.1109/ICASSP.2015.7178885

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

McLaren, M.; Lei, Y.; Ferrer, L.; The Institute of Electrical and Electronics Engineers Signal Processing Society "Advances in deep neural network approaches to speaker recognition" (2015) 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015. 2015-August:4814-4818

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2015-August_n_p4814_McLaren

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

The recent application of deep neural networks (DNN) to speaker identification (SID) has resulted in significant improvements over current state-of-the-art on telephone speech. In this work, we report a similar achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features, and another that uses the DNN during feature modeling. Modeling is conducted using the DNN/i-vector framework, in which the traditional universal background model is replaced with a DNN. The recently proposed use of bottleneck features extracted from a DNN is also evaluated. Systems are first compared with a conventional universal background model (UBM) Gaussian mixture model (GMM) i-vector system on the clean conditions of the NIST 2012 speaker recognition evaluation corpus, where a lack of robustness to microphone speech is found. Several methods of DNN feature processing are then applied to bring significantly greater robustness to microphone speech. To direct future research, the DNN-based systems are also evaluated in the context of audio degradations including noise and reverberation. © 2015 IEEE.

Registro:

Documento:	Conferencia
Título:	Advances in deep neural network approaches to speaker recognition
Autor:	McLaren, M.; Lei, Y.; Ferrer, L.; The Institute of Electrical and Electronics Engineers Signal Processing Society
Filiación:	Speech Technology and Research Laboratory, SRI International, California, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:	bottleneck features; channel mismatch; Deep neural networks; normalization; speaker recognition
Año:	2015
Volumen:	2015-August
Página de inicio:	4814
Página de fin:	4818
DOI:	http://dx.doi.org/10.1109/ICASSP.2015.7178885
Título revista:	40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Título revista abreviado:	ICASSP IEEE Int Conf Acoust Speech Signal Process Proc
ISSN:	15206149
CODEN:	IPROD
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2015-August_n_p4814_McLaren

Referencias:

Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., I-vector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2014) Submitted to IEEE Trans. ASLP
Ferrer, L., Lei, Y., McLaren, M., Scheffer, N., Spoken language recognition based on senone posteriors (2014) Proc. Interspeech
McLaren, M., Lei, Y., Scheffer, N., Ferrer, L., Application of convolutional neural networks to speaker recognition in noisy conditions (2014) Proc Interspeech
Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., Comparative study on the use of senone-based deep neural networks for speaker recognition (2014) Submitted to IEEE Trans. ASLP
Pelecanos, J., Sridharan, S., Feature warping for robust speaker verification (2001) Proc. Speaker Odyssey
Young, S.J., Odell, J.J., Woodland, P.C., Tree-based state tying for high accuracy acoustic modelling (1994) Proc.Workshop on Human Language Technology, pp. 307-312
McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP
McLaren, M., Lei, Y., Improved speaker recognition using DCT coefficients as features (2015) Proc. ICASSP (Submitted)
Prince, S.J.D., Elder, J.H., Probabilistic linear discriminant analysis for inferences about identity (2007) Proc. ICCV. IEEE, pp. 1-8
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
(2012), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf; Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Speech Communication and Technology
Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise-robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, pp. 4253-4256A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society

Citas:

---------- APA ----------

McLaren, M., Lei, Y., Ferrer, L. & The Institute of Electrical and Electronics Engineers Signal Processing Society (2015) . Advances in deep neural network approaches to speaker recognition. 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, 2015-August, 4814-4818.
http://dx.doi.org/10.1109/ICASSP.2015.7178885

---------- CHICAGO ----------

McLaren, M., Lei, Y., Ferrer, L., The Institute of Electrical and Electronics Engineers Signal Processing Society "Advances in deep neural network approaches to speaker recognition" . 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 2015-August (2015) : 4814-4818.
http://dx.doi.org/10.1109/ICASSP.2015.7178885

---------- MLA ----------

McLaren, M., Lei, Y., Ferrer, L., The Institute of Electrical and Electronics Engineers Signal Processing Society "Advances in deep neural network approaches to speaker recognition" . 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, vol. 2015-August, 2015, pp. 4814-4818.
http://dx.doi.org/10.1109/ICASSP.2015.7178885

---------- VANCOUVER ----------

McLaren, M., Lei, Y., Ferrer, L., The Institute of Electrical and Electronics Engineers Signal Processing Society Advances in deep neural network approaches to speaker recognition. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2015;2015-August:4814-4818.
http://dx.doi.org/10.1109/ICASSP.2015.7178885