The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation

Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

doi:10.21437/Interspeech.2016-550

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" (2016) 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016. 08-12-September-2016:3673-3677

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3673_Graciarena

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data. We present system-development results for feature normalization; for feature fusion with acoustic, voicing, and channel bottleneck features; and finally for SAD bottleneck-feature fusion. We present a novel technique called test adaptive calibration, which is designed to improve decision-threshold selection for each test waveform. We present unsupervised test adaptation of the fusion component and describe its tight synergy to the test adaptive calibration component. Finally, we present results on the evaluation test data and show how the proposed techniques lead to significant gains on channels unseen during training. Copyright © 2016 ISCA.

Registro:

Documento:	Conferencia
Título:	The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation
Autor:	Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Filiación:	SRI InternationalCA, United States Departamento de Computación, FCEyN, Universidad de Buenos Aires, CONICET, Argentina
Palabras clave:	Channel degradation; Noise robustness; Speech activity detection; Calibration; Speech; Speech communication; Speech processing; Testing; Adaptive calibration; Bottleneck features; Channel bottlenecks; Channel degradations; Decision threshold; Feature normalization; Noise robustness; Speech activity detections; Speech recognition
Año:	2016
Volumen:	08-12-September-2016
Página de inicio:	3673
Página de fin:	3677
DOI:	http://dx.doi.org/10.21437/Interspeech.2016-550
Título revista:	17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3673_Graciarena

Referencias:

Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
Davis, S.B., Mermelstein, P., Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences (1980) IEEE Transactions on Acoustics, Speech and Signal Processing, 28 (4), pp. 357-366
Hermansky, H., Perceptual linear predictive (PLP) analysis of speech (1990) Acoustical Society of America Journal, 87, pp. 1738-1752. , Apr
Kim, C., Stern, R.M., Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring (2010) Proc. ICASSP, pp. 4574-4577
Mitra, V., Franco, H., Graciarena, M., Mandal, A., Normalized amplitude modulation features for large vocabulary noise-robust speech recognition (2012) Proc. ICASSP 2012, pp. 4117-4120. , March
Sadjadi, S.O., Hansen, J.H., Unsupervised speech activity detection using voicing measures and perceptual spectral flux (2013) IEEE Signal Processing Letters, 20, pp. 197-200
BabaAli, G.B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S., A pitch extraction algorithm tuned for automatic speech recognition Proceedings of ICASSP 2014
Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation Interspeech 2014
Ryant, N., Liberman, M., Yuan, J., Speech activity detection on YouTube using deep neural networks (2013) Interspeech, , Lyon, France, Aug
Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
Ferrer, L., Graciarena, M., Mitra, V., A phonetically aware system for speech activity detection (2016) Proc. ICASSP, , Shanghai, China, March
Shwarz, G., Estimation the dimension of a model (1978) The Annals of Statistics, 6, pp. 461-464A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

Citas:

---------- APA ----------

Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., Narayanan S.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft (2016) . The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550

---------- CHICAGO ----------

Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016 (2016) : 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550

---------- MLA ----------

Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550

---------- VANCOUVER ----------

Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550