Conferencia

Ferrer, L.; McLaren, M.; Sekhar C.C.; Rao P.; Ghosh P.K.; Murthy H.A.; Yegnanarayana B.; Umesh S.; Alku P.; Prasanna S.R.M.; Narayanan S. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" (2018) 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. 2018-September:82-86
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed an extension of the PLDA method, which we termed Joint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches. © 2018 International Speech Communication Association. All rights reserved.

Registro:

Documento: Conferencia
Título:A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
Autor:Ferrer, L.; McLaren, M.; Sekhar C.C.; Rao P.; Ghosh P.K.; Murthy H.A.; Yegnanarayana B.; Umesh S.; Alku P.; Prasanna S.R.M.; Narayanan S.
Filiación:Instituto de Investigación en Ciencias de la Computación, CONICET-Universidad de Buenos Aires, Buenos Aires, Argentina
Speech Technology and Research Lab, SRI International, Menlo Park, United States
Palabras clave:Probabilistic linear discriminant analysis; Speaker recognition; Cost functions; Discriminant analysis; Speech communication; Speech processing; Acoustic characteristic; Acoustic conditions; Joint modeling; Probabilistic linear discriminant analysis; Speaker recognition; Speaker recognition system; Speaker variability; Test condition; Speech recognition
Año:2018
Volumen:2018-September
Página de inicio:82
Página de fin:86
DOI: http://dx.doi.org/10.21437/Interspeech.2018-1280
Título revista:19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2018-September_n_p82_Ferrer

Referencias:

  • Prince, S., Probabilistic linear discriminant analysis for inferences about identity (2007) Proceedings of The International Conference on Computer Vision
  • Kenny, P., Bayesian speaker verification with heavy-tailed priors (2010) Proc. Odyssey-10, , Brno, Czech Republic, Jun. keynote presentation
  • Burget, L., Plchot, O., Cumani, S., Glembek, O., Matejka, P., Brümmer, N., Discriminatively trained probabilistic linear discriminant analysis for speaker verification (2011) Proc. ICASSP, , Prague, May
  • Brümmer, N., (2010) EM for Probabilistic LDA, , https://sites.google.com/site/nikobrummer/EMforPLDA.pdf, Tech. Rep
  • Senoussaoui, M., Kenny, P., Brümmer, N., Mmer, N., De Villiers, E., Dumouchel, P., Mixture of PLDA models in i-vector space for gender-independent speaker recognition (2011) Proc. Interspeech, pp. 25-28. , Florence, Italy, Aug
  • Matejka, P., Glembek, O., Castaldo, F., Alam, J., Plchot, O., Kenny, P., Burget, L., Cernocky, J., Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification (2011) Proc. Interspeech, , Florence, Italy, Aug
  • Ferrer, L., (2017) Joint Probabilistic Linear Discriminant Analysis
  • Ferrer, L., McLaren, M., Joint PLDA for simultaneous modeling of two factors (2018) Journal of Machine Learning Research, , accepted for publication in
  • Garcia-Romero, D., Zhou, X., Espy-Wilson, C.Y., Multi-condition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition (2012) Proc. ICASSP. Kyoto: IEEE, pp. 4257-4260. , Mar
  • Lei, Y., Burget, L., Ferrer, L., Graciarena, M., Scheffer, N., Towards noise robust speaker recognition using probabilistic linear discriminant analysis (2012) Proc. ICASSP, , Kyoto, Mar
  • Li, P., Fu, Y., Mohammed, U., Elder, J., Prince, S., Probabilistic models for inference about identity (2012) IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (1), pp. 144-157
  • Mak, M.-W., Pang, X., Chien, J.-T., Mixture of plda for noise robust i-vector speaker verification (2016) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24 (1), pp. 130-142
  • Cieri, C., Corson, L., Graff, D., Walker, K., Resources for new research directions in speaker recognition: The mixer 3, 4 and 5 corpora (2007) Proc. Interspeech, , Antwerp, Belgium, Aug
  • Beck, S.D., Schwartz, R., Nakasone, H., A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services (2004) Odyssey: The Speaker and Language Recognition Workshop
  • Sizov, A., Lee, K.A., Kinnunen, T., Unifying probabilistic linear discriminant analysis variants in biometric authentication (2014) Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 464-475. , Springer
  • Brümmer, N., (2010) EM for Simplified PLDA, , https://sites.google.com/site/nikobrummer/EMforSPLDA.pdf, Tech. Rep
  • Cumani, S., Plchot, O., Laface, P., On the use of i-vector posterior distributions in probabilistic linear discriminant analysis (2014) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22 (4), pp. 846-857
  • Ferrer, L., (2018) Scoring Formulation for Multi-Condition Joint Plda
  • McLaren, M., Castan, D., Ferrer, L., Lawson, A., On the issue of calibration in DNN-based speaker recognition systems (2016) Proc. Interspeech, , San Francisco, September
  • Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. ICASSP, , Florence, Italy, May
  • Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., Khudanpur, S., Deep neural network-based speaker embeddings for end-to-end speaker verification (2016) Spoken Language Technology Workshop (SLT), 2016 IEEE, pp. 165-170. , IEEE
  • Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S., Deep neural network embeddings for text-independent speaker verification (2017) Proc. Interspeech, , Stockholm, August
  • McLaren, M., Castan, D., Nandwana, M., Ferrer, L., Yilmaz, E., How to train your speaker embeddings extractor (2018) Proc. of Speaker Odyssey, , Les Sables d'Olonne, France, June
  • Garcia-Romero, D., Espy-Wilson, C., Analysis of i-vector length normalization in speaker recognition systems (2011) Proc. Interspeech, , Florence, Italy, Aug
  • Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Scheffer, N., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proceedings of SRE11 Analysis Workshop, , Atlanta, USA, Dec
  • NIST SRE10 Evaluation Plan, , http://www.itl.nist.gov/iad/mig/tests/sre/2010/NISTSRE10evalplan.r6.pdfA4 - Adobe; et al.; JD.Com; MI; Samsung; Uniphore

Citas:

---------- APA ----------
Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., Yegnanarayana B.,..., Narayanan S. (2018) . A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions. 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, 2018-September, 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280
---------- CHICAGO ----------
Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" . 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 2018-September (2018) : 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280
---------- MLA ----------
Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. "A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions" . 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, vol. 2018-September, 2018, pp. 82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280
---------- VANCOUVER ----------
Ferrer, L., McLaren, M., Sekhar C.C., Rao P., Ghosh P.K., Murthy H.A., et al. A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2018;2018-September:82-86.
http://dx.doi.org/10.21437/Interspeech.2018-1280