Improving robustness of speaker recognition to new conditions using unlabeled data

Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D.

doi:10.21437/Interspeech.2017-605

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D. "Improving robustness of speaker recognition to new conditions using unlabeled data" (2017) 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017. 2017-August:3737-3741

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (Snorm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions. Copyright © 2017 ISCA.

Registro:

Documento:	Conferencia
Título:	Improving robustness of speaker recognition to new conditions using unlabeled data
Autor:	Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D.
Filiación:	Speech Technology and Research Laboratory, SRI International, California, United States Instituto de Investigación en Ciencias de la Computación (ICC), CONICET-UBA, Argentina Audias-ATVS, Universidad Autonoma de Madrid, Madrid, Spain
Palabras clave:	NIST SRE16; Score Calibration; Score Normalization; Trial-based Calibration; Calibration; Speech communication; Acoustic conditions; Calibration parameters; NIST SRE16; Score normalization; Speaker clustering; Speaker recognition; Speaker recognition evaluations; Unsupervised techniques; Speech recognition
Año:	2017
Volumen:	2017-August
Página de inicio:	3737
Página de fin:	3741
DOI:	http://dx.doi.org/10.21437/Interspeech.2017-605
Título revista:	18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan

Referencias:

Brümmer, N., Du Preez, J., Application-independent evaluation of speaker detection (2006) Computer Speech & Language, 20 (2), pp. 230-275
McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trial-based calibration for speaker recognition in unseen conditions (2014) Odyssey 2014: The Speaker and Language Recognition Workshop
Auckenthaler, R., Carey, M., Lloyd-Thomas, H., Score Normalization for text-independent speaker verification systems (2000) Digital Signal Processing, 10 (1), pp. 42-54
Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Odyssey, p. 16
Dehak, N., Dehak, R., Glass, J.R., Reynolds, D.A., Kenny, P., Cosine similarity scoring without score Normalization techniques (2010) Odyssey, p. 15
Shum, S., Reynolds, D., Garcia-Romero, D., McCree, A., Unsupervised clustering approaches for domain adaptation in speaker recognition systems (2014) Proc. Odyssey
Garcia-Romero, D., Zhang, X., McCree, A., Povey, D., Improving speaker recognition performance in the domain adaptation challenge using deep neural networks (2014) Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 378-383
Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., Matsoukas, S., Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On, pp. 4032-4036
(2016) The NIST Year 2016 Speaker Recognition Evaluation Plan, , www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016
Sturim, D., Reynolds, D., Speaker adaptive cohort selection for t-norm in text-independent speaker verification (2005) Proc. ICASSP
Brummer, N., Van Leeuwen, D., On calibration of language recognition scores (2006) Odyssey: Speaker and Language Recognition Workshop, pp. 1-8
Villalba, J., (2014) Advances on Speaker Recognition in Non-collaborative Environments, , Ph.D. dissertation, University of Zaragoza
Brummer, N., (2006) Focal Toolkit, , google.com/site/nikobrummer/focal
Brummer, N., (2010) Bosaris Toolkit, , google.com/site/bosaristoolkit
Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. ICASSP
Hautamaki, V., Lee, K., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interpseech
Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Odyssey, pp. 317-323
Kim, C., Stern, R., Power-Normalized cepstral coefficients (PNCC) for robust speech recognition (2012) Proc. ICASSP, pp. 4101-4104
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
McLaren, M., Van Leeuwen, D., Source-Normalized LDA for robust speaker recognition using i-vectors from multiple speech sources (2012) IEEE Transactions on Audio, Speech, and Language Processing, 20 (3), pp. 755-766
McLaren, M., Ferrer, L., Castan, D., Lawson, A., Lozano-Diez, A., The SRI-CON-UAM NIST 2016 SRE system description (2016) Proc. SRE16 Workshop
Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey, pp. 291-297
McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (SITW) speaker recognition database (2016) Proc. InterspeechA4 - Amazon Alexa; Apple; DiDi; et al.; Furhat Robotics; Microsoft

Citas:

---------- APA ----------

Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., Strombergsson S.,..., House D. (2017) . Improving robustness of speaker recognition to new conditions using unlabeled data. 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 2017-August, 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605

---------- CHICAGO ----------

Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. "Improving robustness of speaker recognition to new conditions using unlabeled data" . 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 2017-August (2017) : 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605

---------- MLA ----------

Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. "Improving robustness of speaker recognition to new conditions using unlabeled data" . 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, vol. 2017-August, 2017, pp. 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605

---------- VANCOUVER ----------

Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. Improving robustness of speaker recognition to new conditions using unlabeled data. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2017;2017-August:3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605