Period of Residency: 08/01/2013 to 12/31/2014

Hani Camille Yehia holds a degree in Electronic Engineering (ITA, 1988), a Master in Electronic and Computer Engineering (ITA, 1992) and a PhD in Electrical Engineering (University of Nagoya, Japan, 1997). He was a researcher at ATR Laboratories (Japan) from 1996 to 1998. He Coordinated the Graduate Program in Eng. Elétrica from 2005 to 2009, the Graduate Council of the School of Engineering at UFMG from 2007 to 2009 and INOVA-UFMG (Business Incubator) from 2011 to 2013. He is a professor at the Department of Electronic Engineering at UFMG and Coordinator of the CEFALA – Center for the Study of Speech, Acoustics, Language and Music, developing research on audiovisual production and perception of Speech and Music. In addition, it participates in the coordination of CEMECH – Center for the Study of Human Movement, Expression and Behavior, carrying out studies on human movement, both from the point of view of its gestural consequences (expression), and from the point of view of its underlying mechanisms of expression. production (motor control); and CTPMag – Center for Technology and Research in Magneto-resonance at UFMG. In his works, he seeks to combine basic research in the areas of physics, neuroscience, linguistics and music with applied research in coding technology, recognition and audio-visual synthesis of speech and music.


Speech, defined as the acoustic representation of language, plays a role as a bridge between low-level studies focused on the analysis of signals used in the transmission and processing of information and high-level studies that target the interpretation of the symbols that form the basis of human communication. In this context, this study seeks to develop and use appropriate tools for understanding speech from different points of view.

The first tool shows how it is possible to align coordinate systems used to represent signals of different nature, but which have some degree of dependence or coupling. Its use is exemplified in Yehia, Rubin and Vatikiotis-Bateson (1998), where the existing relationships between the geometry of the vocal tract, the movement of the face and the acoustics of speech are analyzed.

Aligning coordinate systems, however, proves to be insufficient when studying signals whose dependency relation to one another has a delay that fluctuates over time. To deal with this type of case, Barbosa, Vatikiotis-Bateson and Yehia (2012) show how correlation maps can be used not only to measure the degree of coupling between signals, but also the fluctuation of delay over time. An example of the application of this technique is investigated in Teixeira, Loureiro, Wanderley and Yehia (2014), where the relationships between acoustic and movement aspects of musical performances by clarinetists are analyzed.

Sound and motion are relatively easy signals to measure. A deeper analysis of speech requires neurophysiological information that is challenging to measure. Imaging techniques, such as functional magnetic resonance imaging, allow measurements of high spatial resolution, on the order of millimeters, but with very low temporal resolution, on the order of minutes. On the other hand, techniques for measuring electrical activity in the brain, such as surface electroencephalography, have high temporal resolution, on the order of milliseconds, but very poor spatial resolution, since there are multiple internal distributions of activity in the brain capable of generating a same pattern of surface electrical activity. The combination of these two techniques is analyzed in Souza, Yehia, Sato and Callan (2013) where, in a pioneering way, measurements of neural plasticity are performed during the learning process.

Finally, attention must be devoted to the analysis of the existing relationships between objectively measurable physical quantities and their perceptual correlates. In this sense, Vieira, Sansão and Yehia (2014) apply image processing techniques to spectrograms to measure the harmonic-noise ratio of dysphonic voices and, subsequently, study its relationship with the perception of breathiness in the voice. Along the same lines, Maia, Yehia and Errico (2015) survey existing techniques to infer the quality of experience based on parameters extracted from video signals.

In summary, the works carried out can be seen as some of the pieces of a large puzzle capable of representing human speech, both as physically measurable signs and as symbols on which human communication is based.