Abstract:
We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility. © 2013 Association for Computational Linguistics.
Registro:
Documento: |
Conferencia
|
Título: | Improving speech synthesis quality by reducing pitch peaks in the source recordings |
Autor: | Violante, L.; Rodríguez Zivic, P.; Gravano, A.; Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten |
Filiación: | Departamento de Computación, FCEyN, Universidad de Buenos Aires, Argentina
|
Palabras clave: | Computational linguistics; Continuous speech recognition; Speech synthesis; Corpus-based; HMM-based; Speech synthesizer; Synthesized speech; Audio recordings |
Año: | 2013
|
Página de inicio: | 502
|
Página de fin: | 506
|
Título revista: | 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
|
Título revista abreviado: | NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf.
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante |
Referencias:
- Black, A.W., Lenzo, K.A., (2007) Building Synthetic Voices, , http://festvox.org/bsv, Language Technologies Institute, Carnegie Mellon University
- Black, A., Taylor, P., Caley, R., Clark, R., Richmond, K., King, S., Strom, V., Zen, H., (2001) The festival speech synthesis system
- Boersma, P., Weenink, D., (2012) Praat: Doing Phonetics by Computer, , http://www.praat.org/
- Gurlekian, J., Colantoni, L., Torres, H., El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados (2001) Fonoaudiológica, 47, pp. 58-69
- Gurlekian, J.A., Cossio-Mercado, C., Torres, H., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for Argentine Spanish (2012) Proceedings of Iberspeech, , Madrid, Spain
- Moulines, E., Charpentier, F., Pitch-synchronous waveform processing techniques for text-tospeech synthesis using diphones (1990) Speech Communication, 9 (5), pp. 453-467
- Nye, P.W., Gaitenby, J.H., The intelligibility of synthetic monosyllabic words in short, syntactically normal sentences (1974) Haskins Laboratories Status Report on Speech Research, 37 (38), pp. 169-190
- Schröder, M., Trouvain, J., The German text-tospeech synthesis system MARY: A tool for research, development and teaching (2003) International Journal of Speech Technology, 6 (4), pp. 365-377
- Torres, H.M., Gurlekian, J.A., Automatic determination of phrase breaks for Argentine Spanish (2004) Speech Prosody 2004, International Conference
- Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83A4 - Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
Citas:
---------- APA ----------
Violante, L., Rodríguez Zivic, P., Gravano, A. & Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
(2013)
. Improving speech synthesis quality by reducing pitch peaks in the source recordings. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]
---------- CHICAGO ----------
Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
"Improving speech synthesis quality by reducing pitch peaks in the source recordings"
. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
(2013) : 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]
---------- MLA ----------
Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten
"Improving speech synthesis quality by reducing pitch peaks in the source recordings"
. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, 2013, pp. 502-506.
Recuperado de https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]
---------- VANCOUVER ----------
Violante, L., Rodríguez Zivic, P., Gravano, A., Appen ButlerHill; et al.; ETS; Google; Microsoft Research; Rakuten Improving speech synthesis quality by reducing pitch peaks in the source recordings. NAACL HLT - Conf. North Am. Chapter Assoc. Comput. Linguist.: Hum. Lang. Technol., Proc. Main Conf. 2013:502-506.
Available from: https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97819372_v_n_p502_Violante [ ]