Conferencia

Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D. "Improving robustness of speaker recognition to new conditions using unlabeled data" (2017) 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017. 2017-August:3737-3741
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (Snorm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions. Copyright © 2017 ISCA.

Registro:

Documento: Conferencia
Título:Improving robustness of speaker recognition to new conditions using unlabeled data
Autor:Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D.
Filiación:Speech Technology and Research Laboratory, SRI International, California, United States
Instituto de Investigación en Ciencias de la Computación (ICC), CONICET-UBA, Argentina
Audias-ATVS, Universidad Autonoma de Madrid, Madrid, Spain
Palabras clave:NIST SRE16; Score Calibration; Score Normalization; Trial-based Calibration; Calibration; Speech communication; Acoustic conditions; Calibration parameters; NIST SRE16; Score normalization; Speaker clustering; Speaker recognition; Speaker recognition evaluations; Unsupervised techniques; Speech recognition
Año:2017
Volumen:2017-August
Página de inicio:3737
Página de fin:3741
DOI: http://dx.doi.org/10.21437/Interspeech.2017-605
Título revista:18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan

Referencias:

  • Brümmer, N., Du Preez, J., Application-independent evaluation of speaker detection (2006) Computer Speech & Language, 20 (2), pp. 230-275
  • McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trial-based calibration for speaker recognition in unseen conditions (2014) Odyssey 2014: The Speaker and Language Recognition Workshop
  • Auckenthaler, R., Carey, M., Lloyd-Thomas, H., Score Normalization for text-independent speaker verification systems (2000) Digital Signal Processing, 10 (1), pp. 42-54
  • Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Odyssey, p. 16
  • Dehak, N., Dehak, R., Glass, J.R., Reynolds, D.A., Kenny, P., Cosine similarity scoring without score Normalization techniques (2010) Odyssey, p. 15
  • Shum, S., Reynolds, D., Garcia-Romero, D., McCree, A., Unsupervised clustering approaches for domain adaptation in speaker recognition systems (2014) Proc. Odyssey
  • Garcia-Romero, D., Zhang, X., McCree, A., Povey, D., Improving speaker recognition performance in the domain adaptation challenge using deep neural networks (2014) Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 378-383
  • Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., Matsoukas, S., Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On, pp. 4032-4036
  • (2016) The NIST Year 2016 Speaker Recognition Evaluation Plan, , www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016
  • Sturim, D., Reynolds, D., Speaker adaptive cohort selection for t-norm in text-independent speaker verification (2005) Proc. ICASSP
  • Brummer, N., Van Leeuwen, D., On calibration of language recognition scores (2006) Odyssey: Speaker and Language Recognition Workshop, pp. 1-8
  • Villalba, J., (2014) Advances on Speaker Recognition in Non-collaborative Environments, , Ph.D. dissertation, University of Zaragoza
  • Brummer, N., (2006) Focal Toolkit, , google.com/site/nikobrummer/focal
  • Brummer, N., (2010) Bosaris Toolkit, , google.com/site/bosaristoolkit
  • Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. ICASSP
  • Hautamaki, V., Lee, K., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interpseech
  • Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Odyssey, pp. 317-323
  • Kim, C., Stern, R., Power-Normalized cepstral coefficients (PNCC) for robust speech recognition (2012) Proc. ICASSP, pp. 4101-4104
  • Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
  • McLaren, M., Van Leeuwen, D., Source-Normalized LDA for robust speaker recognition using i-vectors from multiple speech sources (2012) IEEE Transactions on Audio, Speech, and Language Processing, 20 (3), pp. 755-766
  • McLaren, M., Ferrer, L., Castan, D., Lawson, A., Lozano-Diez, A., The SRI-CON-UAM NIST 2016 SRE system description (2016) Proc. SRE16 Workshop
  • Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey, pp. 291-297
  • McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (SITW) speaker recognition database (2016) Proc. InterspeechA4 - Amazon Alexa; Apple; DiDi; et al.; Furhat Robotics; Microsoft

Citas:

---------- APA ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., Strombergsson S.,..., House D. (2017) . Improving robustness of speaker recognition to new conditions using unlabeled data. 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 2017-August, 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605
---------- CHICAGO ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. "Improving robustness of speaker recognition to new conditions using unlabeled data" . 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 2017-August (2017) : 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605
---------- MLA ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. "Improving robustness of speaker recognition to new conditions using unlabeled data" . 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, vol. 2017-August, 2017, pp. 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605
---------- VANCOUVER ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. Improving robustness of speaker recognition to new conditions using unlabeled data. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2017;2017-August:3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605