Artículo

Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

The faculty of language depends on the interplay between the production and perception of speech sounds. A relevant open question is whether the dimensions that organize voice perception in the brain are acoustical or depend on properties of the vocal system that produced it. One of the main empirical difficulties in answering this question is to generate sounds that vary along a continuum according to the anatomical properties the vocal apparatus that produced them. Here we use a mathematical model that offers the unique possibility of synthesizing vocal sounds by controlling a small set of anatomically based parameters. In a first stage the quality of the synthetic voice was evaluated. Using specific time traces for sub-glottal pressure and tension of the vocal folds, the synthetic voices generated perceptual responses, which are indistinguishable from those of real speech. The synthesizer was then used to investigate how the auditory cortex responds to the perception of voice depending on the anatomy of the vocal apparatus. Our fMRI results show that sounds are perceived as human vocalizations when produced by a vocal system that follows a simple relationship between the size of the vocal folds and the vocal tract. We found that these anatomical parameters encode the perceptual vocal identity (male, female, child) and show that the brain areas that respond to human speech also encode vocal identity. On the basis of these results, we propose that this low-dimensional model of the vocal system is capable of generating realistic voices and represents a novel tool to explore the voice perception with a precise control of the anatomical variables that generate speech. Furthermore, the model provides an explanation of how auditory cortices encode voices in terms of the anatomical parameters of the vocal system. © 2016

Registro:

Documento: Artículo
Título:Exploring the anatomical encoding of voice with a mathematical model of the vocal system
Autor:Assaneo, M.F.; Sitt, J.; Varoquaux, G.; Sigman, M.; Cohen, L.; Trevisan, M.A.
Filiación:Department of Physics, University of Buenos Aires-IFIBA CONICET, Ciudad Universitaria, Pab. 1, Buenos Aires, 1428EGA, Argentina
Department of Psychology, New York University, New York, NY 10003, United States
INSERM, Cognitive Neuroimaging Unit, Gif sur Yvette, France
Commisariat à l'Energie Atomique, Direction des Sciences du Vivant, I2BM, NeuroSpin Center, Gif sur Yvette, France
INSERM U1127, Institut du Cerveau et de la Moelle Épinière, Paris, France
CNRS UMR 7225, Institut du Cerveau et de la Moelle Épinière, Paris, France
Sorbonne Universités, UPMC Univ Paris 06, Paris, France
INRIA Parietal, Neurospin, CEA Saclay, bât 145, France
Integrative Neuroscience Lab, Physics dept. UBA-IFIBA CONICET, Pab. 1, Buenos Aires, 1428EGA, Argentina
University Torcuato Di Tella, Alm. Juan Saenz Valiente 1010, Buenos Aires, C1428BIJ, Argentina
AP-HP, Groupe Hospitalier Pitié-Salpêtrière, Departament of Neurology, Paris, France
Palabras clave:Auditory cortex; Biomechanical model of the vocal system; Neural coding of voice; Voice identity; accuracy; adult; anatomical variation; Article; auditory cortex; child; comparative study; female; functional magnetic resonance imaging; human; human experiment; larynx; male; mathematical computing; mathematical model; phonetics; priority journal; speech perception; vocal apparatus; vocal cord; vocal fold pressure; vocal fold tension; vocalization; voice analysis; voice parameter; anatomic model; auditory stimulation; biological model; communication aid; computer simulation; glottis; nerve cell network; physiology; procedures; speech; voice; young adult; Acoustic Stimulation; Adult; Auditory Cortex; Communication Aids for Disabled; Computer Simulation; Female; Glottis; Humans; Male; Models, Anatomic; Models, Neurological; Nerve Net; Speech; Speech Acoustics; Speech Perception; Voice; Voice Quality; Young Adult
Año:2016
Volumen:141
Página de inicio:31
Página de fin:39
DOI: http://dx.doi.org/10.1016/j.neuroimage.2016.07.033
Título revista:NeuroImage
Título revista abreviado:NeuroImage
ISSN:10538119
CODEN:NEIME
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_10538119_v141_n_p31_Assaneo

Referencias:

  • Amador, A., Perl, Y.S., Mindlin, G.B., Margoliash, D., Elemental gesture dynamics are encoded by song premotor cortical neurons (2013) Nature, 495, pp. 59-64
  • Assaneo, M.F., Trevisan, M.A., Revisiting the two-mass model of the vocal folds (2013) Pap. Phys., 5, pp. 1-7
  • Assaneo, M.F., Trevisan, M.A., Mindlin, G.B., Discrete motor coordinates for vowel production (2013) PLoS One, 8, p. e80373
  • Baumann, O., Belin, P., Perceptual scaling of voice identity: common dimensions for different vowels and speakers (2010) Psychol. Res., 74, pp. 110-120
  • Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P., Pike, B., Voice-selective areas in human auditory cortex (2000) Nature, 403, pp. 309-312
  • Binder, J.R., Frost, J.A., Hammeke, T.A., Bellgowan, P.S., Springer, J.A., Kaufman, J.N., Possing, E.T., Human temporal lobe activation by speech and nonspeech (2000) Cereb. Cortex, 10, pp. 512-528
  • Boersma, P., Weenink, D., Praat: Doing Phonetics by Computer (2013); Boessenecker, A., Berry, D.A., Lohscheller, J., Eysholdt, U., Doellinger, M., Mucosal wave properties of a human vocal fold (2007) Acta Acust. united with Acust., 93, pp. 815-823
  • Bonte, M., Hausfeld, L., Scharke, W., Valente, G., Formisano, E., Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns (2014) J. Neurosci., 34, pp. 4548-4557
  • Brainard, D.H., The psychophysics toolbox (1997) Spat. Vis., 10, pp. 433-436
  • Bunton, K., Story, B.H., Identification of synthetic vowels based on selected vocal tract area functions (2009) J. Acoust. Soc. Am., 125, pp. 19-22
  • Caclin, A., McAdams, S., Smith, B.K., Winsberg, S., Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones (2005) J. Acoust. Soc. Am., 118, p. 471
  • Chang, E., Rieger, J., Johnson, K., Categorical speech representation in human superior temporal gyrus (2010) Nat. Neurosci., 13, pp. 1428-1432
  • Cogan, G.B., Thesen, T., Carlson, C., Doyle, W., Devinsky, O., Pesaran, B., Sensory-motor transformations for speech occur bilaterally (2014) Nature, 37
  • Correia, J.M., Jansma, B.M.B., Bonte, M., Decoding articulatory features from fMRI responses in dorsal speech regions (2015) J. Neurosci., 35, pp. 15015-15025
  • Dehaene-Lambertz, G., Dehaene, S., Hertz-Pannier, L., Functional neuroimaging of speech perception in infants (2002) Science, 298, pp. 2013-2015
  • Fitch, W.T., Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques (1997) J. Acoust. Soc. Am., 102, pp. 1213-1222
  • Fitch, W.T., Giedd, J., Morphology and development of the human vocal tract: a study using magnetic resonance imaging (1999) J. Acoust. Soc. Am., 106, pp. 1511-1522
  • Font, F., Roma, G., Serra, X., Freesound technical demo (2013) Proc. 21st ACM Int. Conf. Multimed. - MM '13, pp. 411-412
  • Formisano, E., De Martino, F., Bonte, M., Goebel, R., “Who” is saying “what”? Brain-based decoding of human voice and speech (2008) Science, 322, pp. 970-973
  • Fowler, C.A., The reality of phonological forms: a reply to Port (2010) Lang. Sci., 32, pp. 56-59
  • Hickok, G., Poeppel, D., The cortical organization of speech processing (2007) Nature Reviews Neuroscience, 8 (5), pp. 393-402
  • Kuhl, P.K., Ramírez, R.R., Bosseler, A., Lotus, L.J., Imada, T., Infants' brain responses to speech suggest analysis by synthesis (2014) Proc. Natl. Acad. Sci., 111, pp. 11238-11245
  • Kühnis, J., Elmer, S., Meyer, M., Jäncke, L., The encoding of vowels and temporal speech cues in the auditory cortex of professional musicians: an EEG study (2013) Neuropsychologia, pp. 1-11
  • Latinus, M., McAleer, P., Bestelmeyer, P.E.G., Belin, P., Norm-based coding of voice identity in humanauditory cortex (2013) Curr. Biol., 23, pp. 1075-1080
  • Lee, Y.-S., Turkeltaub, P., Granger, R., Raizada, R.D.S., Categorical speech processing in Broca's area: an fMRI study using multivariate pattern-based analysis (2012) J. Neurosci., 32, pp. 3942-3948
  • Liberman, A.M., Mattingly, I.G., The motor theory of speech perception reviewed (1985) Cognition, pp. 1-36
  • Liljencrants, J., Speech Synthesis With a Reflection-Type Line Analog (1985), Royal Institute of Technology Stockholm; Lopez, S., Riera, P., Assaneo, M.F., Eguía, M., Sigman, M., Trevisan, M.A., Vocal caricatures reveal signatures of speaker identity (2013) Sci. Rep., 3, p. 3407
  • Lucero, J.C., Koenig, L.L., Simulations of temporal patterns of oral airflow in men and women using a two-mass model of the vocal folds under dynamic control (2005) J. Acoust. Soc. Am., 117, pp. 1362-1372
  • Mesgarani, N., Cheung, C., Johnson, K., Chang, E.F., Phonetic feature encoding in human superior temporal gyrus (2014) Science, 1006
  • Meyer, P., Wilhelms, R., Strube, H.W., A Quasiarticulatory Speech Synthesizer for German Language Running in Real Time (2010), pp. 523-539; Murphy, D., Kelloniemi, A., Mullen, J., Shelley, S., Acoustic modeling using the digital waveguide mesh (2007) IEEE Signal Process. Mag., 24 (2), pp. 55-66
  • Smith, J.O., Introduction to Digital Filters: With Audio Applications. Julius Smith (2007); Smith, D.R.R., Patterson, R.D., The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age (2005) J. Acoust. Soc. Am., 118, p. 3177
  • Story, B.H., Physiologically-based Speech Simulation Using an Enhanced Wave-reflection Model of the Vocal Tract (1995), University of Iowa; Story, B.H., A parametric model of the vocal tract area function for vowel and consonant simulation (2005) J. Acoust. Soc. Am., 117, p. 3231
  • Story, B.H., Phrase-level speech simulation with an airway modulation model of speech production (2013) Comput. Speech Lang., 27, pp. 989-1010
  • Story, B.H., Bunton, K., Relation of vocal tract shape, formant transitions, and stop consonant identification (2010) J. Speech. Lang. Hear. Res., 53, pp. 1514-1528
  • Story, B.H., Titze, I.R., Parameterization of vocal tract area functions by empirical orthogonal modes (1998) J. Phon., 26, pp. 223-260
  • Story, B.H., Titze, I.R., Hoffman, E.A., Vocal tract area functions from magnetic resonance imaging (1996) J. Acoust. Soc. Am., 100, pp. 537-554
  • Strube, H.W., Time-varying wave digital filters and vocal-tract models (1982) Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'82, pp. 923-926
  • Titze, I., The physics of small-amplitude oscillation of the vocal folds (1988) J. Acoust. Soc. Am., 1536-1552
  • Titze, I.R., Alipour, F., The Myoelastic Aerodynamic Theory of Phonation (2006), National Center for Voice and Speech; Wilson, S.M., Saygin, A.P., Sereno, M.I., Iacoboni, M., Listening to speech activates motor areas involved in speech production (2004) Nat. Neurosci., 7 (7), pp. 701-702

Citas:

---------- APA ----------
Assaneo, M.F., Sitt, J., Varoquaux, G., Sigman, M., Cohen, L. & Trevisan, M.A. (2016) . Exploring the anatomical encoding of voice with a mathematical model of the vocal system. NeuroImage, 141, 31-39.
http://dx.doi.org/10.1016/j.neuroimage.2016.07.033
---------- CHICAGO ----------
Assaneo, M.F., Sitt, J., Varoquaux, G., Sigman, M., Cohen, L., Trevisan, M.A. "Exploring the anatomical encoding of voice with a mathematical model of the vocal system" . NeuroImage 141 (2016) : 31-39.
http://dx.doi.org/10.1016/j.neuroimage.2016.07.033
---------- MLA ----------
Assaneo, M.F., Sitt, J., Varoquaux, G., Sigman, M., Cohen, L., Trevisan, M.A. "Exploring the anatomical encoding of voice with a mathematical model of the vocal system" . NeuroImage, vol. 141, 2016, pp. 31-39.
http://dx.doi.org/10.1016/j.neuroimage.2016.07.033
---------- VANCOUVER ----------
Assaneo, M.F., Sitt, J., Varoquaux, G., Sigman, M., Cohen, L., Trevisan, M.A. Exploring the anatomical encoding of voice with a mathematical model of the vocal system. NeuroImage. 2016;141:31-39.
http://dx.doi.org/10.1016/j.neuroimage.2016.07.033