Abstract:
In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data. We present system-development results for feature normalization; for feature fusion with acoustic, voicing, and channel bottleneck features; and finally for SAD bottleneck-feature fusion. We present a novel technique called test adaptive calibration, which is designed to improve decision-threshold selection for each test waveform. We present unsupervised test adaptation of the fusion component and describe its tight synergy to the test adaptive calibration component. Finally, we present results on the evaluation test data and show how the proposed techniques lead to significant gains on channels unseen during training. Copyright © 2016 ISCA.
Registro:
Documento: |
Conferencia
|
Título: | The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation |
Autor: | Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft |
Filiación: | SRI InternationalCA, United States Departamento de Computación, FCEyN, Universidad de Buenos Aires, CONICET, Argentina
|
Palabras clave: | Channel degradation; Noise robustness; Speech activity detection; Calibration; Speech; Speech communication; Speech processing; Testing; Adaptive calibration; Bottleneck features; Channel bottlenecks; Channel degradations; Decision threshold; Feature normalization; Noise robustness; Speech activity detections; Speech recognition |
Año: | 2016
|
Volumen: | 08-12-September-2016
|
Página de inicio: | 3673
|
Página de fin: | 3677
|
DOI: |
http://dx.doi.org/10.21437/Interspeech.2016-550 |
Título revista: | 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
|
Título revista abreviado: | Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
|
ISSN: | 2308457X
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3673_Graciarena |
Referencias:
- Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
- Davis, S.B., Mermelstein, P., Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences (1980) IEEE Transactions on Acoustics, Speech and Signal Processing, 28 (4), pp. 357-366
- Hermansky, H., Perceptual linear predictive (PLP) analysis of speech (1990) Acoustical Society of America Journal, 87, pp. 1738-1752. , Apr
- Kim, C., Stern, R.M., Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring (2010) Proc. ICASSP, pp. 4574-4577
- Mitra, V., Franco, H., Graciarena, M., Mandal, A., Normalized amplitude modulation features for large vocabulary noise-robust speech recognition (2012) Proc. ICASSP 2012, pp. 4117-4120. , March
- Sadjadi, S.O., Hansen, J.H., Unsupervised speech activity detection using voicing measures and perceptual spectral flux (2013) IEEE Signal Processing Letters, 20, pp. 197-200
- BabaAli, G.B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S., A pitch extraction algorithm tuned for automatic speech recognition Proceedings of ICASSP 2014
- Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation Interspeech 2014
- Ryant, N., Liberman, M., Yuan, J., Speech activity detection on YouTube using deep neural networks (2013) Interspeech, , Lyon, France, Aug
- Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
- Ferrer, L., Graciarena, M., Mitra, V., A phonetically aware system for speech activity detection (2016) Proc. ICASSP, , Shanghai, China, March
- Shwarz, G., Estimation the dimension of a model (1978) The Annals of Statistics, 6, pp. 461-464A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Citas:
---------- APA ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., Narayanan S.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
(2016)
. The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550---------- CHICAGO ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al.
"The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation"
. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016
(2016) : 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550---------- MLA ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al.
"The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation"
. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550---------- VANCOUVER ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550