A phonetically aware system for speech activity detection

Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society

doi:10.1109/ICASSP.2016.7472771

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" (2016) 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016. 2016-May:5710-5714

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2016-May_n_p5710_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data. © 2016 IEEE.

Registro:

Documento:	Conferencia
Título:	A phonetically aware system for speech activity detection
Autor:	Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society
Filiación:	Speech Technology and Research Laboratory, SRI International, California, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
Palabras clave:	bottleneck features; deep neural networks; degraded channels; Speech activity detection
Año:	2016
Volumen:	2016-May
Página de inicio:	5710
Página de fin:	5714
DOI:	http://dx.doi.org/10.1109/ICASSP.2016.7472771
Título revista:	41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Título revista abreviado:	ICASSP IEEE Int Conf Acoust Speech Signal Process Proc
ISSN:	15206149
CODEN:	IPROD
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2016-May_n_p5710_Ferrer

Referencias:

Ramirez, J., Górriz, J.M., Segura, J.C., Voice activity detection (2007) Fundamentals and Speech Recognition System Robustness, , INTECH Open Access Publisher
Ng, T., Zhang, B., Nguyen, L., Matsoukas, S., Zhou, X., Mesgarani, N., Vesely, K., Matejka, P., Developing a speech activity detection system for the DARPA RATS program (2012) Proc. Interspeech, , Portland, USA, Sept
Saon, G., Thomas, S., Soltau, H., Ganapathy, S., Kingsbury, B., The IBM speech activity detection system for the DARPA RATS program (2013) Proc. Interspeech, , Lyon, France, Aug
Graciarena, M., Alwan, A., Ellis, D., Franco, H., Ferrer, L., Hansen, J.H.L., Janin, A., Mitra, V., All for one: Feature combination for highly channel-degraded speech activity detection (2013) Proc. Interspeech, , Lyon, France, Aug
Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation (2014) Proc. Interspeech, , Singapore, Sept
Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Submitted to ICASSP 2016
McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. ICASSP, , Brisbane, Australia, May
Richardson, F., Reynolds, D.A., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech, , Dresden, Sept
Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., Ivector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Odyssey-14, , Joensuu, Finland, June
Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) Submitted to IEEE Trans. Audio Speech and Language Processing
Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June
NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplan-v17-r1.pdf
NIST OpenSAD Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/OpenSADEvalPlanv8.pdf
Sainath, T.N., Vinyals, O., Senior, A., Sak, H., Convolutional, long short-term memory, fully connected deep neural networks (2015) Proc. ICASSP, , Brisbane, Australia, May
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D., Convolutional neural networks for speech recognition (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22 (10), pp. 1533-1545A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society

Citas:

---------- APA ----------

Ferrer, L., Graciarena, M., Mitra, V. & The Institute of Electrical and Electronics Engineers Signal Processing Society (2016) . A phonetically aware system for speech activity detection. 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, 2016-May, 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771

---------- CHICAGO ----------

Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" . 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 2016-May (2016) : 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771

---------- MLA ----------

Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society "A phonetically aware system for speech activity detection" . 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, vol. 2016-May, 2016, pp. 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771

---------- VANCOUVER ----------

Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society A phonetically aware system for speech activity detection. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2016;2016-May:5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771