Improving speech synthesis quality by reducing pitch peaks in the source recordings

Violante, L.; Rodríguez Zivic, P.; Gravano, A.; Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Conferencia

Violante, L.; Rodríguez Zivic, P.; Gravano, A.; Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten "Improving speech synthesis quality by reducing pitch peaks in the source recordings" (2013) 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013:502-506

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante

Estamos trabajando para incorporar este artículo al repositorio

Abstract:

We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility. © 2013 Association for Computational Linguistics.

Registro:

Documento:	Conferencia
Título:	Improving speech synthesis quality by reducing pitch peaks in the source recordings
Autor:	Violante, L.; Rodríguez Zivic, P.; Gravano, A.; Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
Filiación:	Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina
Palabras clave:	Computational linguistics; Continuous speech recognition; Speech synthesis; Corpus-based; HMM-based; Speech synthesizer; Synthesized speech; Audio recordings
Año:	2013
Página de inicio:	502
Página de fin:	506
Título revista:	2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
Título revista abreviado:	NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf.
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante

Referencias:

Black, A.W., Lenzo, K.A., (2007) Building Synthetic Voices, , http://festvox.org/bsv, Language Technologies Institute, Carnegie Mellon University
Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H., (2001) The festival speech synthesis system
Boersma, P., Weenink, D., (2012) Praat: Doing Phonetics by Computer, , http://www.praat.org/
Gurlekian, J., Colantoni, L., Torres, H., El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados (2001) Fonoaudiológica, 47, pp. 58-69
Gurlekian, J.A., Cossio-Mercado, C., Torres, H., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for Argentine Spanish (2012) Proceedings of Iberspeech, , Madrid, Spain
Moulines, E., Charpentier, F., Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones (1990) Speech Communication, 9 (5), pp. 453-467
Nye, P.W., Gaitenby, J.H., The intelligibility of synthetic monosyllabic words in short, syntactically normal sentences (1974) Haskins Laboratories Status Report on Speech Research, 37 (38), pp. 169-190
Schröder, M., Trouvain, J., The German text-tospeech synthesis system MARY: A tool for research, development and teaching (2003) International Journal of Speech Technology, 6 (4), pp. 365-377
Torres, H.M., Gurlekian, J.A., Automatic determination of phrase breaks for Argentine Spanish (2004) Speech Prosody 2004, International Conference
Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83A4 - Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten

Citas:

---------- APA ----------

Violante, L., Rodríguez Zivic, P., Gravano, A. & Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten (2013) . Improving speech synthesis quality by reducing pitch peaks in the source recordings. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]

---------- CHICAGO ----------

Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten "Improving speech synthesis quality by reducing pitch peaks in the source recordings" . 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 (2013) : 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]

---------- MLA ----------

Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten "Improving speech synthesis quality by reducing pitch peaks in the source recordings" . 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, 2013, pp. 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]

---------- VANCOUVER ----------

Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten Improving speech synthesis quality by reducing pitch peaks in the source recordings. NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf. 2013:502-506.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]