Mitigating the effects of non-stationary unseen noises on language recognition performance

Ferrer, L.; McLaren, M.; Lawson, A.; Graciarena, M.; Noth E.; Steidl S.; Moller S.; Ney H.; Mobius B.; Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Ferrer, L.; McLaren, M.; Lawson, A.; Graciarena, M.; Noth E.; Steidl S.; Moller S.; Ney H.; Mobius B.; Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories "Mitigating the effects of non-stationary unseen noises on language recognition performance" (2015) 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015. 2015-January:3446-3450

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte la política de Acceso Abierto del editor

Abstract:

We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected noises are added to these signals to achieve a chosen signal-tonoise ratio and percentage of corruption. We study the effect of these noises on LR performance as a function of these parameters and present some initial methods to mitigate the degradation, focusing on the speech activity detection (SAD) step. These methods include discarding the C0 coefficient from the features used for SAD, using a more stringent threshold on the SAD scores, thresholding the speech likelihoods returned by the model as an additional way of detecting noise, and a final model adaptation step. We show that a system optimized for clean speech is clearly suboptimal on this new dataset since the proposed methods lead to gains of up to 35% on the corrupted data, without knowledge of the test noises and with very little effect on clean data performance. Copyright © 2015 ISCA.

Registro:

Documento:	Conferencia
Título:	Mitigating the effects of non-stationary unseen noises on language recognition performance
Autor:	Ferrer, L.; McLaren, M.; Lawson, A.; Graciarena, M.; Noth E.; Steidl S.; Moller S.; Ney H.; Mobius B.; Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories
Filiación:	Departamento de Computacion, FCEN, Universidad de Buenos Aires and CONICET, Argentina Speech Technology and Research Laboratory, SRI InternationalCA, United States
Palabras clave:	Non-stationary noise; Speech activity detection; Spoken language recognition; Computational linguistics; Signal detection; Speech; Speech communication; Statistical tests; Data performance; Language recognition; Model Adaptation; National Institute of Standards and Technology; Nonstationary noise; Signaltonoise ratio (SNR); Speech activity detections; Spoken language recognition; Speech recognition
Año:	2015
Volumen:	2015-January
Página de inicio:	3446
Página de fin:	3450
Título revista:	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer

Referencias:

http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09EvalPlanv6.pdf, NIST LRE09 evaluation plan; Hirsch, H.-G., Pearce, D., The aurora experimental frameworkfor the performance evaluation of speech recognition systems undernoisy conditions (2000) ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and ResearchWorkshop (ITRW)
Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., Allen, J., SPEECHDAT-CAR. A large speechdatabase for automotive environments (2000) LREC
Parihar, N., Picone, J., Aurora working group: DSR frontend LVCSR evaluation (2002) Inst. for Signal and Information Process, 40, p. 94. , Mississippi State University, Tech. Rep
Hirsch, H., Aurora-5 experimental framework for the performanceevaluation of speech recognition in case of a hand s-freespeech input in noisy environments (2007) Niederrhein Univ. of AppliedSciences
Schmidt-Nielsen, A., Marsh, E., Tardelli, J., Gatewood, P., Kreamer, E., Tremain, T., Cieri, C., Wright, J., Speech in noisyenvironments (SPINE) evaluation audio (2000) Linguistic Data Consortium
Ferrer, L., Bratt, H., Burget, L., Cernocky, H., Glembek, O., Graciarena, M., Lawson, A., Scheffer, N., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proceedings of SRE11 AnalysisWorkshop, , Atlanta, USA, Dec
http://www.nist.gov/itl/iad/mig/upload/NISTSRE12evalplan-v17-r1.pdf, NIST SRE12 evaluation plan; Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language RecognitionWorkshop
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. Audio, Speech, and Lang. Process, 19 (4), pp. 788-798. , May
Penagarikano, M., Varona, A., Diez, M., Rodriguez-Fuentes, L.J., Bordel, G., Study of different backends in a state-of-theartlanguage recognition system (2012) Interspeech-2012, pp. 2049-2052
Bielefeld, B., Language identification using shifted delta cepstrum (1994) Fourteenth Annual Speech Research Symposium
Martinez, D.G., Plchot, O., Burget, L., Glembek, O., Matejka, P., Language recognition in iVectors space (2011) Proc. Interspeech, , Florence, Italy, Aug
Ng, T., Zhang, B., Nguyen, L., Matsoukas, S., Zhou, X., Mesgarani, N., Vesely, K., Matejka, P., Developing a speech activitydetection system for the DARPA RATS program (2012) Proc. Interspeech, , Portland, USA, Sep
Graciarena, M., Alwan, A., Ellis, D., Franco, H., Ferrer, L., Hansen, J.H., Janin, A., Mitra, V., All for one: Feature combination for highly channel-degraded speech activitydetection (2013) Proc. Interspeech, , Lyon, France, Aug
Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V., A noise-robust system for NIST 2012 speaker recognitionevaluation (2013) Proc. Interspeech, , Lyon, France, Aug
Reynolds, D.A., Quatieri, T.F., Dunn, R.B., Speaker verificationusing adapted Gaussian mixture models (2000) Digital SignalProcessing, 10, pp. 19-41A4 - Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories

Citas:

---------- APA ----------

Ferrer, L., McLaren, M., Lawson, A., Graciarena, M., Noth E., Steidl S., Moller S.,..., Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories (2015) . Mitigating the effects of non-stationary unseen noises on language recognition performance. 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, 2015-January, 3446-3450.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer [ ]

---------- CHICAGO ----------

Ferrer, L., McLaren, M., Lawson, A., Graciarena, M., Noth E., Steidl S., et al. "Mitigating the effects of non-stationary unseen noises on language recognition performance" . 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 2015-January (2015) : 3446-3450.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer [ ]

---------- MLA ----------

Ferrer, L., McLaren, M., Lawson, A., Graciarena, M., Noth E., Steidl S., et al. "Mitigating the effects of non-stationary unseen noises on language recognition performance" . 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, vol. 2015-January, 2015, pp. 3446-3450.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer [ ]

---------- VANCOUVER ----------

Ferrer, L., McLaren, M., Lawson, A., Graciarena, M., Noth E., Steidl S., et al. Mitigating the effects of non-stationary unseen noises on language recognition performance. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2015;2015-January:3446-3450.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer [ ]