A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions

Ferrer, L.; McLaren, M.; Sekhar C.C.; Rao P.; Ghosh P.K.; Murthy H.A.; Yegnanarayana B.; Umesh S.; Alku P.; Prasanna S.R.M.; Narayanan S.

doi:10.21437/Interspeech.2018-1280

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Ferrer, L.; McLaren, M.; Sekhar C.C.; Rao P.; Ghosh P.K.; Murthy H.A.; Yegnanarayana B.; Umesh S.; Alku P.; Prasanna S.R.M.; Narayanan S. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" (2018) 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. 2018-September:82-86

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2018-September_n_p82_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed an extension of the PLDA method, which we termed Joint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches. © 2018 International Speech Communication Association. All rights reserved.

Registro:

Documento:	Conferencia
Título:	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
Autor:	Ferrer, L.; McLaren, M.; Sekhar C.C.; Rao P.; Ghosh P.K.; Murthy H.A.; Yegnanarayana B.; Umesh S.; Alku P.; Prasanna S.R.M.; Narayanan S.
Filiación:	Instituto de Investigación en Ciencias de la Computación, CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina Speech Technology and Research Lab, SRI International, Menlo Park, United States
Palabras clave:	Probabilistic linear discriminant analysis; Speaker recognition; Cost functions; Discriminant analysis; Speech communication; Speech processing; Acoustic characteristic; Acoustic conditions; Joint modeling; Probabilistic linear discriminant analysis; Speaker recognition; Speaker recognition system; Speaker variability; Test condition; Speech recognition
Año:	2018
Volumen:	2018-September
Página de inicio:	82
Página de fin:	86
DOI:	http://dx.doi.org/10.21437/Interspeech.2018-1280
Título revista:	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2018-September_n_p82_Ferrer

Referencias:

Prince, S., Probabilistic linear discriminant analysis for inferences about identity (2007) Proceedings of The International Conference on Computer Vision
Kenny, P., Bayesian speaker verification with heavy-tailed priors (2010) Proc. Odyssey-10, , Brno, Czech Republic, Jun. keynote presentation
Burget, L., Plchot, O., Cumani, S., Glembek, O., Matejka, P., Brümmer, N., Discriminatively trained probabilistic linear discriminant analysis for speaker verification (2011) Proc. ICASSP, , Prague, May
Brümmer, N., (2010) EM for Probabilistic LDA, , https://sites.google.com/site/nikobrummer/EMforPLDA.pdf, Tech. Rep
Senoussaoui, M., Kenny, P., Brümmer, N., Mmer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender-independent speaker recognition (2011) Proc. Interspeech, pp. 25-28. , Florence, Italy, Aug
Matejka, P., Glembek, O., Castaldo, F., Alam, J., Plchot, O., Kenny, P., Burget, L., Cernocky, J., Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification (2011) Proc. Interspeech, , Florence, Italy, Aug
Ferrer, L., (2017) Joint Probabilistic Linear Discriminant Analysis
Ferrer, L., McLaren, M., Joint PLDA for simultaneous modeling of two factors (2018) Journal of Machine Learning Research, , accepted for publication in
Garcia-Romero, D., Zhou, X., Espy-Wilson, C.Y., Multi-condition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition (2012) Proc. ICASSP. Kyoto: IEEE, pp. 4257-4260. , Mar
Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, , Kyoto, Mar
Li, P., Fu, Y., Mohammed, U., Elder, J., Prince, S., Probabilistic models for inference about identity (2012) IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (1), pp. 144-157
Mak, M.-W., Pang, X., Chien, J.-T., Mixture of plda for noise robust i-vector speaker verification (2016) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24 (1), pp. 130-142
Cieri, C., Corson, L., Graff, D., Walker, K., Resources for new research directions in speaker recognition: The mixer 3, 4 and 5 corpora (2007) Proc. Interspeech, , Antwerp, Belgium, Aug
Beck, S.D., Schwartz, R., Nakasone, H., A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services (2004) Odyssey: The Speaker and Language Recognition Workshop
Sizov, A., Lee, K.A., Kinnunen, T., Unifying probabilistic linear discriminant analysis variants in biometric authentication (2014) Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 464-475. , Springer
Brümmer, N., (2010) EM for Simplified PLDA, , https://sites.google.com/site/nikobrummer/EMforSPLDA.pdf, Tech. Rep
Cumani, S., Plchot, O., Laface, P., On the use of i-vector posterior distributions in probabilistic linear discriminant analysis (2014) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22 (4), pp. 846-857
Ferrer, L., (2018) Scoring Formulation for Multi-Condition Joint Plda
McLaren, M., Castan, D., Ferrer, L., Lawson, A., On the issue of calibration in DNN-based speaker recognition systems (2016) Proc. Interspeech, , San Francisco, September
Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP, , Florence, Italy, May
Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., Khudanpur, S., Deep neural network-based speaker embeddings for end-to-end speaker verification (2016) Spoken Language Technology Workshop (SLT), 2016 IEEE, pp. 165-170. , IEEE
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S., Deep neural network embeddings for text-independent speaker verification (2017) Proc. Interspeech, , Stockholm, August
McLaren, M., Castan, D., Nandwana, M., Ferrer, L., Yilmaz, E., How to train your speaker embeddings extractor (2018) Proc. of Speaker Odyssey, , Les Sables d'Olonne, France, June
Garcia-Romero, D., Espy-Wilson, C., Analysis of i-vector length normalization in speaker recognition systems (2011) Proc. Interspeech, , Florence, Italy, Aug
Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Scheffer, N., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proceedings of SRE11 Analysis Workshop, , Atlanta, USA, Dec
NIST SRE10 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/sre/2010/NISTSRE10evalplan.r6.pdfA4 - Adobe; et al.; JD.Com; MI; Samsung; Uniphore

Citas:

---------- APA ----------

Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., Yegnanarayana B.,..., Narayanan S. (2018) . A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions. 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2018-September, 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280

---------- CHICAGO ----------

Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" . 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 2018-September (2018) : 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280

---------- MLA ----------

Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" . 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, vol. 2018-September, 2018, pp. 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280

---------- VANCOUVER ----------

Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2018;2018-September:82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280