Abstract:
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.
Registro:
Documento: |
Conferencia
|
Título: | On the issue of calibration in DNN-based speaker recognition systems |
Autor: | McLaren, M.; Castan, D.; Ferrer, L.; Lawson, A.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft |
Filiación: | Speech Technology and Research Laboratory, SRI InternationalCA, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
|
Palabras clave: | Bottleneck features; Calibration; Deep neural network; Mismatch; Speaker recognition; Alignment; Calibration; Speech communication; Speech processing; Bottleneck features; Computationally efficient; Deep neural networks; Discriminative power; Mismatch; Speaker recognition; Speaker recognition system; Universal background model; Speech recognition |
Año: | 2016
|
Volumen: | 08-12-September-2016
|
Página de inicio: | 1825
|
Página de fin: | 1829
|
DOI: |
http://dx.doi.org/10.21437/Interspeech.2016-1134 |
Título revista: | 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
|
Título revista abreviado: | Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
|
ISSN: | 2308457X
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p1825_McLaren |
Referencias:
- Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically aware deep neural network (2014) Proc. ICASSP
- Richardson, F., Reynolds, D., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech
- Garcia-Romero, D., McCree, A., Insights into deep neural networks for speaker recognition (2015) Proc. Interspeech
- Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J., Deep neural networks for extracting baum-welch statistics for speaker recognition (2014) Proc. Speaker Odyssey
- Lei, Y., Ferrer, L., McLaren, M., Scheffer, N., A deep neural network speaker verification system targeting microphone speech (2014) Proc. Interspeech
- McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. IEEE ICASSP
- Matejka, P., Zhang, L., Ng, T., Mallidi, S., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Speaker Odyssey
- Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Speaker Odyssey
- Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) IEEE Trans. Audio Speech and Language Processing, , Submitted to
- Brummer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Computer Speech and Language, 20 (2-3), pp. 230-275
- McLaren, M., Lawson, A., Ferrer, L., Lei, S.N., Trial-based calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey 2014: The Speaker and Language Recognition Workshop
- Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. on Speech and Audio Processing, 19, pp. 788-798
- Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P., Joint factor analysis versus eigenchannels in speaker recognition (2007) IEEE Trans. on Speech and Audio Processing, 15 (4), pp. 1435-1447
- McLaren, M., Scheffer, N., Ferrer, L., Lei, Y., Effective use of DCTs for contextualizing features for speaker recognition (2014) Proc. ICASSP
- McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Proc. IEEE ICASSP
- Matejka, P., Glembek, O., Novotny, O., Plchot, O., Grezl, F., Burget, L., Cernocky, J., Analysis of DNN approaches to speaker identification (2016) Proc. IEEE ICASSP
- Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Plchot, O., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST 2011 Workshop
- McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (sitw) speaker recognition database (2016) Interspeech
- Senoussaoui, M., Kenny, P., Brummer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender independent speaker recognition (2011) Proc. Int. Conf. on Speech Communication and Technology
- Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
- Garcia-Romero, D., Espy-Wilson, C., Analysis of i-vector length normalization in speaker recognition systems (2011) Proc. Interspeech, pp. 249-252
- McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesn, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Citas:
---------- APA ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., Morgan N.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
(2016)
. On the issue of calibration in DNN-based speaker recognition systems. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134---------- CHICAGO ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al.
"On the issue of calibration in DNN-based speaker recognition systems"
. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016
(2016) : 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134---------- MLA ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al.
"On the issue of calibration in DNN-based speaker recognition systems"
. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134---------- VANCOUVER ----------
McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., et al. On the issue of calibration in DNN-based speaker recognition systems. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:1825-1829.
http://dx.doi.org/10.21437/Interspeech.2016-1134