Abstract:
The output scores of most of the speaker recognition systems are not directly interpretable as stand-alone values. For this reason, a calibration step is usually performed on the scores to convert them into proper likelihood ratios, which have a clear probabilistic interpretation. The standard calibration approach transforms the system scores using a linear function trained using data selected to closely match the evaluation conditions. This selection, though, is not feasible when the evaluation conditions are unknown. In previous work, we proposed a calibration approach for this scenario called trial-based calibration (TBC). TBC trains a separate calibration model for each test trial using data that is dynamically selected from a candidate training set to match the conditions of the trial. In this work, we extend the TBC method, proposing: 1) a new similarity metric for selecting training data that result in significant gains over the one proposed in the original work; 2) a new option that enables the system to reject a trial when not enough matched data are available for training the calibration model; and 3) the use of regularization to improve the robustness of the calibration models trained for each trial. We test the proposed algorithms on a development set composed of several conditions and on the Federal Bureau of Investigation multi-condition speaker recognition dataset, and we demonstrate that the proposed approach reduces calibration loss to values close to 0 for most of the conditions when matched calibration data are available for selection, and that it can reject most of the trials for which relevant calibration data are unavailable. © 2014 IEEE.
Registro:
Documento: |
Artículo
|
Título: | Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option |
Autor: | Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A. |
Filiación: | Instituto de Investigación en Ciencias de la Computación, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad de Buenos Aires, Buenos Aires, B105, Argentina Speech Technology and Research Laboratory, SRI International, Menlo Park, CA 94025, United States
|
Palabras clave: | forensic voice comparison; Speaker recognition; trial-based calibration; Calibration; Data structures; Logistics; Mathematical transformations; Personnel training; Statistical tests; Computational model; Forensic voice comparisons; Forensics; Probabilistic interpretation; Similarity metrics; Speaker recognition; Speaker recognition system; Standard calibration; Speech recognition |
Año: | 2019
|
Volumen: | 27
|
Número: | 1
|
Página de inicio: | 140
|
Página de fin: | 153
|
DOI: |
http://dx.doi.org/10.1109/TASLP.2018.2875794 |
Título revista: | IEEE/ACM Transactions on Audio Speech and Language Processing
|
Título revista abreviado: | IEEE ACM Trans. Audio Speech Lang. Process.
|
ISSN: | 23299290
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_23299290_v27_n1_p140_Ferrer |
Referencias:
- Ferrer, L., Sönmez, K., Kajarekar, S., Class-dependent score combination for speaker recognition (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
- Solewicz, Y., Koppel, M., Considering speech quality in speaker verification fusion (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
- Solewicz, Y., Koppel, M., Using post-classifiers to enhance fusion of low-and high-level speaker recognition (2007) IEEE Trans. Audio, Speech, Lang. Process., 15 (7), pp. 2063-2071. , Sep
- Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. IEEE Int. Conf. Acoust., pp. 4853-4856. , Speech, Signal Process., Las Vegas, NV, USA, Apr
- Brümmer, N., (2008) Focal Bilinear Toolkit, , http://niko.brummer.googlepages.com/focalbilinear
- McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trialbased calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey, , Joensuu, Finland, Jun
- Morrison, G., Enzinger, E., Forensic speech science-Review: 2010-2013 (2013) Proc. 17th Int. Forensic Sci. Managers' Symp., pp. 616-623. , Lyon, France
- Schwartz, R., When to punt on speaker comparison? (2011) J. Acoust. Soc. Amer., 130. , Oct
- Brümmer, N., Swart, A., Van Leeuwen, D., A comparison of linear and non-linear calibrations for speaker recognition (2014) Proc. Odyssey, , Joensuu, Finland, Jun
- Hautamäki, V., Lee, K.A., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interspeech, , Florence, Italy, Aug
- Sturim, D., Reynolds, D., Speaker adaptive cohort selection for T-norm in text-independent speaker verification (2005) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. I/741-I/744. , Philadelphia, PA, USA, Mar
- Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Proc. Odyssey, , Brno, Czech Republic, Jun
- Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Improving robustness of speaker recognition to new conditions using unlabeled data (2017) Proc. Interspeech, , Stockholm, Sweden, Aug
- Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Proc. Odyssey, , Singapore, Jun
- Mandasari, M.I., Saeidi, R., McLaren, M., Van Leeuwen, D.A., Quality measure functions for calibration of speaker recognition systems in various duration conditions (2013) IEEE Trans. Audio, Speech, Lang. Process., 21 (11), pp. 2425-2438. , Nov
- Graff, D., Walker, K., Miller, D., (2001) Switchboard Cellular Part 1 Audio LDC2001S13, , https://catalog.ldc.upenn.edu/LDC2001S13
- Graff, D., Walker, K., Canavan, A., (1999) Switchboard-2 Phase II LDC99S79, , https://catalog.ldc.upenn.edu/LDC99S79
- Morrison, G., (2015) Forensic Database of Voice Recordings of 500+ Australian English Speakers, , http://databases.forensic-voice-comparison.net
- McGovern, S.G., (2004) A Model for Room Acoustics, , https://www.mathworks.com/matlabcentral/fileexchange/5110-fastconvolution
- Hirsch, G., (2005) Fant, , http://dnt.kr.hs-niederrhein.de/download.html
- Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Proc. Odyssey, , Singapore, Jun
- Beck, S.D., Schwartz, R., Nakasone, H., A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services (2004) Proc. Odyssey, , Toledo, Spain, May
- Lei, Y., Hansen, J., Dialect classification via text-independent training and testing for Arabic, Spanish and Chinese (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (1), pp. 85-96. , Jan
- Przybocki, M., Martin, A., The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking (1999) Proc. Eurospeech, , Budapest, Hungary, Sep
- Godin, K.W., Sadjadi, S.O., Hansen, J.H., Impact of noise reduction and spectrum estimation on noise robust speaker identification (2013) Proc. Interspeech, , Lyon, France, Aug
- Brümmer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Comput. Speech Lang., 20, pp. 230-275. , Apr.-Jul
- Brümmer, N., Du Preez, J., The PAV Algorithm Optimizes Binary Proper Scoring Rules, , https://sites.google.com/site/nikobrummer/pav-optimizes-rbpsr.pdf
- Cieri, C., Corson, L., Graff, D., Walker, K., Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 corpora (2007) Proc. Interspeech, , Antwerp, Belgium, Aug
- Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (4), pp. 788-798. , May
- McLaren, M., Castan, D., Ferrer, L., Lawson, A., On the issue of calibration in DNN-based speaker recognition systems (2016) Proc. Interspeech, , San Francisco, CA, USA, Sep
- Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 1695-1699. , Florence, Italy, May
- Ferrer, L., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST Speaker Recognit. Anal. Workshop, pp. 1-7. , Atlanta, GA, USA, Dec
- McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesán, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702. , Lyon, France, Aug
Citas:
---------- APA ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D. & Lawson, A.
(2019)
. Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(1), 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794---------- CHICAGO ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A.
"Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option"
. IEEE/ACM Transactions on Audio Speech and Language Processing 27, no. 1
(2019) : 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794---------- MLA ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A.
"Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option"
. IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 27, no. 1, 2019, pp. 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794---------- VANCOUVER ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE ACM Trans. Audio Speech Lang. Process. 2019;27(1):140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794