Conferencia

Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" (2016) 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016. 08-12-September-2016:3673-3677
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data. We present system-development results for feature normalization; for feature fusion with acoustic, voicing, and channel bottleneck features; and finally for SAD bottleneck-feature fusion. We present a novel technique called test adaptive calibration, which is designed to improve decision-threshold selection for each test waveform. We present unsupervised test adaptation of the fusion component and describe its tight synergy to the test adaptive calibration component. Finally, we present results on the evaluation test data and show how the proposed techniques lead to significant gains on channels unseen during training. Copyright © 2016 ISCA.

Registro:

Documento: Conferencia
Título:The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation
Autor:Graciarena, M.; Ferrer, L.; Mitra, V.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Filiación:SRI InternationalCA, United States
Departamento de Computación, FCEyN, Universidad de Buenos Aires, CONICET, Argentina
Palabras clave:Channel degradation; Noise robustness; Speech activity detection; Calibration; Speech; Speech communication; Speech processing; Testing; Adaptive calibration; Bottleneck features; Channel bottlenecks; Channel degradations; Decision threshold; Feature normalization; Noise robustness; Speech activity detections; Speech recognition
Año:2016
Volumen:08-12-September-2016
Página de inicio:3673
Página de fin:3677
DOI: http://dx.doi.org/10.21437/Interspeech.2016-550
Título revista:17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Título revista abreviado:Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:2308457X
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3673_Graciarena

Referencias:

  • Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
  • Davis, S.B., Mermelstein, P., Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences (1980) IEEE Transactions on Acoustics, Speech and Signal Processing, 28 (4), pp. 357-366
  • Hermansky, H., Perceptual linear predictive (PLP) analysis of speech (1990) Acoustical Society of America Journal, 87, pp. 1738-1752. , Apr
  • Kim, C., Stern, R.M., Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring (2010) Proc. ICASSP, pp. 4574-4577
  • Mitra, V., Franco, H., Graciarena, M., Mandal, A., Normalized amplitude modulation features for large vocabulary noise-robust speech recognition (2012) Proc. ICASSP 2012, pp. 4117-4120. , March
  • Sadjadi, S.O., Hansen, J.H., Unsupervised speech activity detection using voicing measures and perceptual spectral flux (2013) IEEE Signal Processing Letters, 20, pp. 197-200
  • BabaAli, G.B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S., A pitch extraction algorithm tuned for automatic speech recognition Proceedings of ICASSP 2014
  • Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation Interspeech 2014
  • Ryant, N., Liberman, M., Yuan, J., Speech activity detection on YouTube using deep neural networks (2013) Interspeech, , Lyon, France, Aug
  • Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
  • Ferrer, L., Graciarena, M., Mitra, V., A phonetically aware system for speech activity detection (2016) Proc. ICASSP, , Shanghai, China, March
  • Shwarz, G., Estimation the dimension of a model (1978) The Annals of Statistics, 6, pp. 461-464A4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

Citas:

---------- APA ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., Narayanan S.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft (2016) . The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550
---------- CHICAGO ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016 (2016) : 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550
---------- MLA ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. "The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550
---------- VANCOUVER ----------
Graciarena, M., Ferrer, L., Mitra, V., Morgan N., Georgiou P., Morgan N., et al. The SRI system for the NIST OpenSAD 2015 speech activity detection evaluation. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:3673-3677.
http://dx.doi.org/10.21437/Interspeech.2016-550