Abstract:
Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data. © 2016 IEEE.
Registro:
Documento: |
Conferencia
|
Título: | A phonetically aware system for speech activity detection |
Autor: | Ferrer, L.; Graciarena, M.; Mitra, V.; The Institute of Electrical and Electronics Engineers Signal Processing Society |
Filiación: | Speech Technology and Research Laboratory, SRI International, California, United States Departamento de Computación, FCEN, Universidad de Buenos Aires and CONICET, Argentina
|
Palabras clave: | bottleneck features; deep neural networks; degraded channels; Speech activity detection |
Año: | 2016
|
Volumen: | 2016-May
|
Página de inicio: | 5710
|
Página de fin: | 5714
|
DOI: |
http://dx.doi.org/10.1109/ICASSP.2016.7472771 |
Título revista: | 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
|
Título revista abreviado: | ICASSP IEEE Int Conf Acoust Speech Signal Process Proc
|
ISSN: | 15206149
|
CODEN: | IPROD
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206149_v2016-May_n_p5710_Ferrer |
Referencias:
- Ramirez, J., Górriz, J.M., Segura, J.C., Voice activity detection (2007) Fundamentals and Speech Recognition System Robustness, , INTECH Open Access Publisher
- Ng, T., Zhang, B., Nguyen, L., Matsoukas, S., Zhou, X., Mesgarani, N., Vesely, K., Matejka, P., Developing a speech activity detection system for the DARPA RATS program (2012) Proc. Interspeech, , Portland, USA, Sept
- Saon, G., Thomas, S., Soltau, H., Ganapathy, S., Kingsbury, B., The IBM speech activity detection system for the DARPA RATS program (2013) Proc. Interspeech, , Lyon, France, Aug
- Graciarena, M., Alwan, A., Ellis, D., Franco, H., Ferrer, L., Hansen, J.H.L., Janin, A., Mitra, V., All for one: Feature combination for highly channel-degraded speech activity detection (2013) Proc. Interspeech, , Lyon, France, Aug
- Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation (2014) Proc. Interspeech, , Singapore, Sept
- Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
- McLaren, M., Ferrer, L., Lawson, A., Exploring the role of phonetic bottleneck features for speaker and language recognition (2016) Submitted to ICASSP 2016
- McLaren, M., Lei, Y., Ferrer, L., Advances in deep neural network approaches to speaker recognition (2015) Proc. ICASSP, , Brisbane, Australia, May
- Richardson, F., Reynolds, D.A., Dehak, N., A unified deep neural network for speaker and language recognition (2015) Proc. Interspeech, , Dresden, Sept
- Song, Y., Jiang, B., Bao, Y., Wei, S., Dai, L., Ivector representation based on bottleneck features for language identification (2013) Electronics Letters, 49 (24), pp. 1569-1570
- Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B., Neural network bottleneck features for language identification (2014) Proc. Odyssey-14, , Joensuu, Finland, June
- Ferrer, L., Lei, Y., McLaren, M., Study of senone-based deep neural network approaches for spoken language recognition (2015) Submitted to IEEE Trans. Audio Speech and Language Processing
- Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition Workshop
- Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N., Application of convolutional neural networks to language identification in noisy conditions (2014) Proc. Odyssey-14, , Joensuu, Finland, June
- NIST SRE12 Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplan-v17-r1.pdf
- NIST OpenSAD Evaluation Plan, , http://www.nist.gov/itl/iad/mig/upload/OpenSADEvalPlanv8.pdf
- Sainath, T.N., Vinyals, O., Senior, A., Sak, H., Convolutional, long short-term memory, fully connected deep neural networks (2015) Proc. ICASSP, , Brisbane, Australia, May
- Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D., Convolutional neural networks for speech recognition (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22 (10), pp. 1533-1545A4 - The Institute of Electrical and Electronics Engineers Signal Processing Society
Citas:
---------- APA ----------
Ferrer, L., Graciarena, M., Mitra, V. & The Institute of Electrical and Electronics Engineers Signal Processing Society
(2016)
. A phonetically aware system for speech activity detection. 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, 2016-May, 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771---------- CHICAGO ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society
"A phonetically aware system for speech activity detection"
. 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 2016-May
(2016) : 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771---------- MLA ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society
"A phonetically aware system for speech activity detection"
. 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, vol. 2016-May, 2016, pp. 5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771---------- VANCOUVER ----------
Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society A phonetically aware system for speech activity detection. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc. 2016;2016-May:5710-5714.
http://dx.doi.org/10.1109/ICASSP.2016.7472771