Minimizing annotation effort for adaptation of speech-activity detection systems

Ferrer, L.; Graciarena, M.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

doi:10.21437/Interspeech.2016-247

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Ferrer, L.; Graciarena, M.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft "Minimizing annotation effort for adaptation of speech-activity detection systems" (2016) 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016. 08-12-September-2016:3002-3006

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3002_Ferrer

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

Annotating audio data for the presence and location of speech is a time-consuming and therefore costly task. This is mostly because annotation precision greatly affects the performance of the speech-activity detection (SAD) systems trained with this data, which means that the annotation process must be careful and detailed. Although significant amounts of data are already annotated for speech presence and are available to train SAD systems, these systems are known to perform poorly on channels that are not well-represented by the training data. However obtaining representative audio samples from a new channel is relative easy and this data can be used for training a new SAD system or adapting one trained with larger amounts of mismatched data. This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation. We propose simple approaches for selection that lead to significant gains over na?ive methods that merely select N full files at random. An approach that uses the framelevel scores from a baseline system to select regions such that the score distribution is uniformly sampled gives the best tradeoff across a variety of channel groups. Copyright © 2016 ISCA.

Registro:

Documento:	Conferencia
Título:	Minimizing annotation effort for adaptation of speech-activity detection systems
Autor:	Ferrer, L.; Graciarena, M.; Morgan N.; Georgiou P.; Morgan N.; Narayanan S.; Metze F.; Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Filiación:	Departamento de Computación, FCEyN, Universidad de Buenos Aires, CONICET, Argentina Speech Technology and Research Laboratory, SRI InternationalCA, United States
Palabras clave:	Active learning; Adaptation; Annotation; Speech-activity detection; Artificial intelligence; Budget control; Speech; Speech communication; Speech processing; Active Learning; Adaptation; Annotation; Audio samples; Baseline systems; Simple approach; Speech activity detections; Training data; Speech recognition
Año:	2016
Volumen:	08-12-September-2016
Página de inicio:	3002
Página de fin:	3006
DOI:	http://dx.doi.org/10.21437/Interspeech.2016-247
Título revista:	17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Título revista abreviado:	Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH
ISSN:	2308457X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v08-12-September-2016_n_p3002_Ferrer

Referencias:

Ryant, N., Liberman, M., Yuan, J., Speech activity detection on YouTube using deep neural networks (2013) Proc. Interspeech, , Lyon, France, Aug
Ma, J., Improving the speech activity detection for the DARPA RATS phase-3 evaluation (2014) Proc. Interspeech, , Singapore, Sep
Thomas, S., Saon, G., Van Segbroeck, M., Narayanan, S.S., Improvements to the IBM speech activity detection system for the DARPA RATS program (2015) Proc. ICASSP, , Brisbane, Australia, May
Ferrer, L., Graciarena, M., Mitra, V., A phonetically aware system for speech activity detection (2016) Proc. ICASSP, , Shanghai, China, March
Settles, B., Active learning literature survey (2010) University of Wisconsin, Madison, 52 (55-66), p. 11
Gelly, G., Gauvain, J.-L., Minimum word error training of rnnbased voice activity detection (2015) Proc. Interspeech, , Dresden, Sep
Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Odyssey 2012: The Speaker and Language Recognition WorkshopA4 - Amazon Alexa; Apple; eBay; et al.; Google; Microsoft

Citas:

---------- APA ----------

Ferrer, L., Graciarena, M., Morgan N., Georgiou P., Morgan N., Narayanan S., Metze F.,..., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft (2016) . Minimizing annotation effort for adaptation of speech-activity detection systems. 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, 08-12-September-2016, 3002-3006.
http://dx.doi.org/10.21437/Interspeech.2016-247

---------- CHICAGO ----------

Ferrer, L., Graciarena, M., Morgan N., Georgiou P., Morgan N., Narayanan S., et al. "Minimizing annotation effort for adaptation of speech-activity detection systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 08-12-September-2016 (2016) : 3002-3006.
http://dx.doi.org/10.21437/Interspeech.2016-247

---------- MLA ----------

Ferrer, L., Graciarena, M., Morgan N., Georgiou P., Morgan N., Narayanan S., et al. "Minimizing annotation effort for adaptation of speech-activity detection systems" . 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, vol. 08-12-September-2016, 2016, pp. 3002-3006.
http://dx.doi.org/10.21437/Interspeech.2016-247

---------- VANCOUVER ----------

Ferrer, L., Graciarena, M., Morgan N., Georgiou P., Morgan N., Narayanan S., et al. Minimizing annotation effort for adaptation of speech-activity detection systems. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH. 2016;08-12-September-2016:3002-3006.
http://dx.doi.org/10.21437/Interspeech.2016-247