Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option

Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A.

doi:10.1109/TASLP.2018.2875794

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Artículo

Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" (2019) IEEE/ACM Transactions on Audio Speech and Language Processing. 27(1):140-153

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_23299290_v27_n1_p140_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

The output scores of most of the speaker recognition systems are not directly interpretable as stand-alone values. For this reason, a calibration step is usually performed on the scores to convert them into proper likelihood ratios, which have a clear probabilistic interpretation. The standard calibration approach transforms the system scores using a linear function trained using data selected to closely match the evaluation conditions. This selection, though, is not feasible when the evaluation conditions are unknown. In previous work, we proposed a calibration approach for this scenario called trial-based calibration (TBC). TBC trains a separate calibration model for each test trial using data that is dynamically selected from a candidate training set to match the conditions of the trial. In this work, we extend the TBC method, proposing: 1) a new similarity metric for selecting training data that result in significant gains over the one proposed in the original work; 2) a new option that enables the system to reject a trial when not enough matched data are available for training the calibration model; and 3) the use of regularization to improve the robustness of the calibration models trained for each trial. We test the proposed algorithms on a development set composed of several conditions and on the Federal Bureau of Investigation multi-condition speaker recognition dataset, and we demonstrate that the proposed approach reduces calibration loss to values close to 0 for most of the conditions when matched calibration data are available for selection, and that it can reject most of the trials for which relevant calibration data are unavailable. © 2014 IEEE.

Registro:

Documento:	Artículo
Título:	Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option
Autor:	Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A.
Filiación:	Instituto de Investigación en Ciencias de la Computación, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad de Buenos Aires, Buenos Aires, B105, Argentina Speech Technology and Research Laboratory, SRI International, Menlo Park, CA 94025, United States
Palabras clave:	forensic voice comparison; Speaker recognition; trial-based calibration; Calibration; Data structures; Logistics; Mathematical transformations; Personnel training; Statistical tests; Computational model; Forensic voice comparisons; Forensics; Probabilistic interpretation; Similarity metrics; Speaker recognition; Speaker recognition system; Standard calibration; Speech recognition
Año:	2019
Volumen:	27
Número:	1
Página de inicio:	140
Página de fin:	153
DOI:	http://dx.doi.org/10.1109/TASLP.2018.2875794
Título revista:	IEEE/ACM Transactions on Audio Speech and Language Processing
Título revista abreviado:	IEEE ACM Trans. Audio Speech Lang. Process.
ISSN:	23299290
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_23299290_v27_n1_p140_Ferrer

Referencias:

Ferrer, L., Sönmez, K., Kajarekar, S., Class-dependent score combination for speaker recognition (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
Solewicz, Y., Koppel, M., Considering speech quality in speaker verification fusion (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
Solewicz, Y., Koppel, M., Using post-classifiers to enhance fusion of low-and high-level speaker recognition (2007) IEEE Trans. Audio, Speech, Lang. Process., 15 (7), pp. 2063-2071. , Sep
Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. IEEE Int. Conf. Acoust., pp. 4853-4856. , Speech, Signal Process., Las Vegas, NV, USA, Apr
Brümmer, N., (2008) Focal Bilinear Toolkit, , http://niko.brummer.googlepages.com/focalbilinear
McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trialbased calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey, , Joensuu, Finland, Jun
Morrison, G., Enzinger, E., Forensic speech science-Review: 2010-2013 (2013) Proc. 17th Int. Forensic Sci. Managers' Symp., pp. 616-623. , Lyon, France
Schwartz, R., When to punt on speaker comparison? (2011) J. Acoust. Soc. Amer., 130. , Oct
Brümmer, N., Swart, A., Van Leeuwen, D., A comparison of linear and non-linear calibrations for speaker recognition (2014) Proc. Odyssey, , Joensuu, Finland, Jun
Hautamäki, V., Lee, K.A., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interspeech, , Florence, Italy, Aug
Sturim, D., Reynolds, D., Speaker adaptive cohort selection for T-norm in text-independent speaker verification (2005) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. I/741-I/744. , Philadelphia, PA, USA, Mar
Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Proc. Odyssey, , Brno, Czech Republic, Jun
Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Improving robustness of speaker recognition to new conditions using unlabeled data (2017) Proc. Interspeech, , Stockholm, Sweden, Aug
Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Proc. Odyssey, , Singapore, Jun
Mandasari, M.I., Saeidi, R., McLaren, M., Van Leeuwen, D.A., Quality measure functions for calibration of speaker recognition systems in various duration conditions (2013) IEEE Trans. Audio, Speech, Lang. Process., 21 (11), pp. 2425-2438. , Nov
Graff, D., Walker, K., Miller, D., (2001) Switchboard Cellular Part 1 Audio LDC2001S13, , https://catalog.ldc.upenn.edu/LDC2001S13
Graff, D., Walker, K., Canavan, A., (1999) Switchboard-2 Phase II LDC99S79, , https://catalog.ldc.upenn.edu/LDC99S79
Morrison, G., (2015) Forensic Database of Voice Recordings of 500+ Australian English Speakers, , http://databases.forensic-voice-comparison.net
McGovern, S.G., (2004) A Model for Room Acoustics, , https://www.mathworks.com/matlabcentral/fileexchange/5110-fastconvolution
Hirsch, G., (2005) Fant, , http://dnt.kr.hs-niederrhein.de/download.html
Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Proc. Odyssey, , Singapore, Jun
Beck, S.D., Schwartz, R., Nakasone, H., A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services (2004) Proc. Odyssey, , Toledo, Spain, May
Lei, Y., Hansen, J., Dialect classification via text-independent training and testing for Arabic, Spanish and Chinese (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (1), pp. 85-96. , Jan
Przybocki, M., Martin, A., The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking (1999) Proc. Eurospeech, , Budapest, Hungary, Sep
Godin, K.W., Sadjadi, S.O., Hansen, J.H., Impact of noise reduction and spectrum estimation on noise robust speaker identification (2013) Proc. Interspeech, , Lyon, France, Aug
Brümmer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Comput. Speech Lang., 20, pp. 230-275. , Apr.-Jul
Brümmer, N., Du Preez, J., The PAV Algorithm Optimizes Binary Proper Scoring Rules, , https://sites.google.com/site/nikobrummer/pav-optimizes-rbpsr.pdf
Cieri, C., Corson, L., Graff, D., Walker, K., Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 corpora (2007) Proc. Interspeech, , Antwerp, Belgium, Aug
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (4), pp. 788-798. , May
McLaren, M., Castan, D., Ferrer, L., Lawson, A., On the issue of calibration in DNN-based speaker recognition systems (2016) Proc. Interspeech, , San Francisco, CA, USA, Sep
Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 1695-1699. , Florence, Italy, May
Ferrer, L., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST Speaker Recognit. Anal. Workshop, pp. 1-7. , Atlanta, GA, USA, Dec
McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesán, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702. , Lyon, France, Aug

Citas:

---------- APA ----------

Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D. & Lawson, A. (2019) . Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(1), 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794

---------- CHICAGO ----------

Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" . IEEE/ACM Transactions on Audio Speech and Language Processing 27, no. 1 (2019) : 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794

---------- MLA ----------

Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" . IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 27, no. 1, 2019, pp. 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794

---------- VANCOUVER ----------

Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE ACM Trans. Audio Speech Lang. Process. 2019;27(1):140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794