The human voice is the product of a complex biological process that is influenced by myriad factors. Due to the enormous number of parameters that play a role in its production, no two voices in the world are alike. This opens up the possibility of voice being both, an identifier (a biometric) and a descriptor for the speaker. As an identifier, voice is potentially as unique as DNA and fingerprints. As a descriptor, voice is more revealing than DNA or fingerprints. It carries information that can be linked to the current (referring to the time of production) physical, physiological, demographic, medical, environmental and myriad other bio-relevant characteristics of the speaker. This presentation is the story of my journey through the emerging science of human profiling from voice. From hypothesis to algorithm, from inception to state-of-art, it has meandered through voice forensics, acoustics, biology and myriad other fields. It has been studded with triumphs and riddled with challenges, many of which remain unsolved. My work builds on the hypothesis that if any factor whatsoever influences the human mind or body, and if a biological pathway exists between that influence and the voice production mechanism, then there must exist an effect on voice. The challenge lies in discovering and quantifying these effects. Recent breakthroughs include recreating the human face from voice in-vacuo, mechanisms for biomarker discovery, and for the reconstruction of different aspects of the human physical form in 3D from voice.
Dr. Rita Singh is an Associate Research Professor at the School of Computer Science at Carnegie Mellon University (CMU) in Pittsburgh, USA, and (by courtesy) at the Electrical and Computer Engineering Department at CMU. She is also affiliated to the Institute for Strategic Analysis, and the Cyber Security & Privacy Institute (Cylab), CMU. Her academic career spans over two decades of research on a wide range of topics in the areas of speech and audio signal processing, multimedia forensics and cyber forensics. Her current work is focused on creating and developing the science of profiling humans from their voice, a new sub-area of Artificial Intelligence and Voice Forensics.
The technology pioneered by her group has led to two world firsts: In September 2018, her team created the world’s first live voice-based profiling system, demonstrated at the World Economic Forum in Tianjin, China. In 2019 her group also created the world’s first instance of human voice – that of the artist Rembrandt – generated based on evidence from facial images. This work was commissioned by Walter Thompson Inc., the Rijksmuseum in Holland and ING Bank of Europe. At CMU, she teaches Computational Forensics and AI, Multimedia processing, and Quantum Computing (all graduate level courses in CS), and is the author of the book “Profiling Humans from their Voice,” published in June 2019 by Springer-Nature, Singapore.