Abstract:
Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (Snorm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions. Copyright © 2017 ISCA.
Registro:
Documento: |
Conferencia
|
Título: | Improving robustness of speaker recognition to new conditions using unlabeled data |
Autor: | Castan, D.; McLaren, M.; Ferrer, L.; Lawson, A.; Lozano-Diez, A.; Lacerda F.; Strombergsson S.; Wlodarczak M.; Heldner M.; Gustafson J.; House D. |
Filiación: | Speech Technology and Research Laboratory, SRI International, California, United States Instituto de Investigación en Ciencias de la Computación (ICC), CONICET-UBA, Argentina Audias-ATVS, Universidad Autonoma de Madrid, Madrid, Spain
|
Palabras clave: | NIST SRE16; Score Calibration; Score Normalization; Trial-based Calibration; Calibration; Speech communication; Acoustic conditions; Calibration parameters; NIST SRE16; Score normalization; Speaker clustering; Speaker recognition; Speaker recognition evaluations; Unsupervised techniques; Speech recognition |
Año: | 2017
|
Volumen: | 2017-August
|
Página de inicio: | 3737
|
Página de fin: | 3741
|
DOI: |
http://dx.doi.org/10.21437/Interspeech.2017-605 |
Título revista: | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
|
Título revista abreviado: | Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
|
ISSN: | 2308457X
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan |
Referencias:
- Brümmer, N., Du Preez, J., Application-independent evaluation of speaker detection (2006) Computer Speech & Language, 20 (2), pp. 230-275
- McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trial-based calibration for speaker recognition in unseen conditions (2014) Odyssey 2014: The Speaker and Language Recognition Workshop
- Auckenthaler, R., Carey, M., Lloyd-Thomas, H., Score Normalization for text-independent speaker verification systems (2000) Digital Signal Processing, 10 (1), pp. 42-54
- Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Odyssey, p. 16
- Dehak, N., Dehak, R., Glass, J.R., Reynolds, D.A., Kenny, P., Cosine similarity scoring without score Normalization techniques (2010) Odyssey, p. 15
- Shum, S., Reynolds, D., Garcia-Romero, D., McCree, A., Unsupervised clustering approaches for domain adaptation in speaker recognition systems (2014) Proc. Odyssey
- Garcia-Romero, D., Zhang, X., McCree, A., Povey, D., Improving speaker recognition performance in the domain adaptation challenge using deep neural networks (2014) Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 378-383
- Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., Matsoukas, S., Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On, pp. 4032-4036
- (2016) The NIST Year 2016 Speaker Recognition Evaluation Plan, , www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016
- Sturim, D., Reynolds, D., Speaker adaptive cohort selection for t-norm in text-independent speaker verification (2005) Proc. ICASSP
- Brummer, N., Van Leeuwen, D., On calibration of language recognition scores (2006) Odyssey: Speaker and Language Recognition Workshop, pp. 1-8
- Villalba, J., (2014) Advances on Speaker Recognition in Non-collaborative Environments, , Ph.D. dissertation, University of Zaragoza
- Brummer, N., (2006) Focal Toolkit, , google.com/site/nikobrummer/focal
- Brummer, N., (2010) Bosaris Toolkit, , google.com/site/bosaristoolkit
- Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. ICASSP
- Hautamaki, V., Lee, K., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interpseech
- Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Odyssey, pp. 317-323
- Kim, C., Stern, R., Power-Normalized cepstral coefficients (PNCC) for robust speech recognition (2012) Proc. ICASSP, pp. 4101-4104
- Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognition evaluation (2013) Proc. Interpseech
- McLaren, M., Van Leeuwen, D., Source-Normalized LDA for robust speaker recognition using i-vectors from multiple speech sources (2012) IEEE Transactions on Audio, Speech, and Language Processing, 20 (3), pp. 755-766
- McLaren, M., Ferrer, L., Castan, D., Lawson, A., Lozano-Diez, A., The SRI-CON-UAM NIST 2016 SRE system description (2016) Proc. SRE16 Workshop
- Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey, pp. 291-297
- McLaren, M., Ferrer, L., Castan, D., Lawson, A., The speakers in the wild (SITW) speaker recognition database (2016) Proc. InterspeechA4 - Amazon Alexa; Apple; DiDi; et al.; Furhat Robotics; Microsoft
Citas:
---------- APA ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., Strombergsson S.,..., House D.
(2017)
. Improving robustness of speaker recognition to new conditions using unlabeled data. 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, 2017-August, 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605---------- CHICAGO ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al.
"Improving robustness of speaker recognition to new conditions using unlabeled data"
. 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 2017-August
(2017) : 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605---------- MLA ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al.
"Improving robustness of speaker recognition to new conditions using unlabeled data"
. 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, vol. 2017-August, 2017, pp. 3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605---------- VANCOUVER ----------
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Lacerda F., et al. Improving robustness of speaker recognition to new conditions using unlabeled data. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2017;2017-August:3737-3741.
http://dx.doi.org/10.21437/Interspeech.2017-605