Conferencia

Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" (2016) 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016. 2016-May:5710-5714
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data. © 2016 IEEE.

Registro:

Documento: Conferencia
Título:A phonetically aware system for speech activity detection
Autor:Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society
Filiación:Speech Technology and Research Laboratory, SRI International, California, United States
Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:bottleneck features; deep neural networks; degraded channels; Speech activity detection
Año:2016
Volumen:2016-May
Página de inicio:5710
Página de fin:5714
DOI: http://dx.doi.org/10.1109/ICASSP.2016.7472771
Título revista:41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Título revista abreviado:ICASSP IEEE Int Conf Acoust Speech Signal Process Proc
ISSN:15206149
CODEN:IPROD
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2016-May_n_p5710_Ferrer

Referencias:

  • Ramirez, J., Górriz, J.M., Segura, J.C., Voice activity detection (2007) Fundamentals and Speech Recognition System Robustness, , INTECH Open Access Publisher
  • Ng, T., Zhang, B., Nguyen, L., Matsoukas, S., Zhou, X., Mesgarani, N., Vesely, K., Matejka, P., Developing a speech activity detection system for the DARPA RATS program (2012) Proc. Interspeech, , Portland, USA, Sept
  • Saon, G., Thomas, S., Soltau, H., Ganapathy, S., Kingsbury, B., The IBM speech activity detection system for the DARPA RATS program (2013) Proc. Interspeech, , Lyon, France, Aug
  • Graciarena, M., Alwan, A., Ellis, D., Franco, H., Ferrer, L., Hansen, J.H.L., Janin, A., Mitra, V., All for one: Feature combination for highly channel-degraded speech activity detection (2013) Proc. Interspeech, , Lyon, France, Aug
  • Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation (2014) Proc. Interspeech, , Singapore, Sept
  • Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
  • McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Submitted to ICASSP 2016
  • McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. ICASSP, , Brisbane, Australia, May
  • Richardson, F., Reynolds, D.A., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech, , Dresden, Sept
  • Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., Ivector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
  • Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Odyssey-14, , Joensuu, Finland, June
  • Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) Submitted to IEEE Trans. Audio Speech and Language Processing
  • Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
  • Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June
  • NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplan-v17-r1.pdf
  • NIST OpenSAD Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/OpenSADEvalPlanv8.pdf
  • Sainath, T.N., Vinyals, O., Senior, A., Sak, H., Convolutional, long short-term memory, fully connected deep neural networks (2015) Proc. ICASSP, , Brisbane, Australia, May
  • Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D., Convolutional neural networks for speech recognition (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22 (10), pp. 1533-1545A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society

Citas:

---------- APA ----------
Ferrer, L., Graciarena, M., Mitra, V. & The Institute of Electrical and Electronics Engineers Signal Processing Society (2016) . A phonetically aware system for speech activity detection. 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, 2016-May, 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771
---------- CHICAGO ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" . 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 2016-May (2016) : 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771
---------- MLA ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" . 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, vol. 2016-May, 2016, pp. 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771
---------- VANCOUVER ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society A phonetically aware system for speech activity detection. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2016;2016-May:5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771