On the issue of calibration in DNN-based speaker recognition systems

McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

doi:10.21437/Interspeech.2016-1134

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft "On the issue of calibration in DNN-based speaker recognition systems" (2016) 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016. 08-12-September-2016:1825-1829

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.

Registro:

Documento:	Conferencia
Título:	On the issue of calibration in DNN-based speaker recognition systems
Autor:	McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Filiación:	Speech Technology and Research Laboratory, SRI InternationalCA, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:	Bottleneck features; Calibration; Deep neural network; Mismatch; Speaker recognition; Alignment; Calibration; Speech communication; Speech processing; Bottleneck features; Computationally efficient; Deep neural networks; Discriminative power; Mismatch; Speaker recognition; Speaker recognition system; Universal background model; Speech recognition
Año:	2016
Volumen:	08-12-September-2016
Página de inicio:	1825
Página de fin:	1829
DOI:	http://dx.doi.org/10.21437/Interspeech.2016-1134
Título revista:	17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren

Referencias:

Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically aware deep neural network (2014) Proc. ICASSP
Richardson, F., Reynolds, D., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech
Garcia-Romero, D., McCree, A., Insights into deep neural networks for speaker recognition (2015) Proc. Interspeech
Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J., Deep neural networks for extracting baum-welch statistics for speaker recognition (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech
McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. IEEE ICASSP
Matejka, P., Zhang, L., Ng, T., Mallidi, S., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) IEEE Trans. Audio Speech and Language Processing, , Submitted to
Brummer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Computer Speech and Language, 20 (2-3), pp. 230-275
McLaren, M., Lawson, A., Ferrer, L., Lei, S.N., Trial-based calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey 2014: The Speaker and Language Recognition Workshop
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P., Joint factor analysis versus eigenchannels in speaker recognition (2007) IEEE Trans. on Speech and Audio Processing, 15 (4), pp. 1435-1447
McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP
McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Proc. IEEE ICASSP
Matejka, P., Glembek, O., Novotny, O., Plchot, O., Grezl, F., Burget, L., Cernocky, J., Analysis of DNN approaches to speaker identification (2016) Proc. IEEE ICASSP
Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (sitw) speaker recognition database (2016) Interspeech
Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Int. Conf. on Speech Communication and Technology
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
Garcia-Romero, D., Espy-Wilson, C., Analysis of i-vector length normalization in speaker recognition systems (2011) Proc. Interspeech, pp. 249-252
McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesn, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

Citas:

---------- APA ----------

McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., Morgan N.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft (2016) . On the issue of calibration in DNN-based speaker recognition systems. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134

---------- CHICAGO ----------

McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. "On the issue of calibration in DNN-based speaker recognition systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016 (2016) : 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134

---------- MLA ----------

McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. "On the issue of calibration in DNN-based speaker recognition systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134

---------- VANCOUVER ----------

McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. On the issue of calibration in DNN-based speaker recognition systems. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134