Conferencia

McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft "On the issue of calibration in DNN-based speaker recognition systems" (2016) 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016. 08-12-September-2016:1825-1829
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.

Registro:

Documento: Conferencia
Título:On the issue of calibration in DNN-based speaker recognition systems
Autor:McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Filiación:Speech Technology and Research Laboratory, SRI InternationalCA, United States
Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:Bottleneck features; Calibration; Deep neural network; Mismatch; Speaker recognition; Alignment; Calibration; Speech communication; Speech processing; Bottleneck features; Computationally efficient; Deep neural networks; Discriminative power; Mismatch; Speaker recognition; Speaker recognition system; Universal background model; Speech recognition
Año:2016
Volumen:08-12-September-2016
Página de inicio:1825
Página de fin:1829
DOI: http://dx.doi.org/10.21437/Interspeech.2016-1134
Título revista:17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren

Referencias:

  • Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically aware deep neural network (2014) Proc. ICASSP
  • Richardson, F., Reynolds, D., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech
  • Garcia-Romero, D., McCree, A., Insights into deep neural networks for speaker recognition (2015) Proc. Interspeech
  • Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J., Deep neural networks for extracting baum-welch statistics for speaker recognition (2014) Proc. Speaker Odyssey
  • Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech
  • McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. IEEE ICASSP
  • Matejka, P., Zhang, L., Ng, T., Mallidi, S., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
  • Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
  • Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) IEEE Trans. Audio Speech and Language Processing, , Submitted to
  • Brummer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Computer Speech and Language, 20 (2-3), pp. 230-275
  • McLaren, M., Lawson, A., Ferrer, L., Lei, S.N., Trial-based calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey 2014: The Speaker and Language Recognition Workshop
  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
  • Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P., Joint factor analysis versus eigenchannels in speaker recognition (2007) IEEE Trans. on Speech and Audio Processing, 15 (4), pp. 1435-1447
  • McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP
  • McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Proc. IEEE ICASSP
  • Matejka, P., Glembek, O., Novotny, O., Plchot, O., Grezl, F., Burget, L., Cernocky, J., Analysis of DNN approaches to speaker identification (2016) Proc. IEEE ICASSP
  • Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
  • McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (sitw) speaker recognition database (2016) Interspeech
  • Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Int. Conf. on Speech Communication and Technology
  • Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
  • Garcia-Romero, D., Espy-Wilson, C., Analysis of i-vector length normalization in speaker recognition systems (2011) Proc. Interspeech, pp. 249-252
  • McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesn, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

Citas:

---------- APA ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., Morgan N.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft (2016) . On the issue of calibration in DNN-based speaker recognition systems. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134
---------- CHICAGO ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. "On the issue of calibration in DNN-based speaker recognition systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016 (2016) : 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134
---------- MLA ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. "On the issue of calibration in DNN-based speaker recognition systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134
---------- VANCOUVER ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. On the issue of calibration in DNN-based speaker recognition systems. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134