Talks

Efficient Acoustic Simulation for Learning-based Virtual and Real World Audio Processing

Zhenyu Tang

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Friday, April 1, 2022, 12:30-2:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Sound propagation is commonly perceived as air pressure perturbations due to vibrating/moving objects. The energy of sound gets attenuated by transmitting in the air over a distance, and by being absorbed at other object surfaces. Numerous research have focused on devising better acoustic simulation methods to model sound propagation in a more realistic manner. The benefits of accurate acoustic simulations include but are not limited to: computer-aided acoustic design, acoustic optimization, synthetic speech data generation, and immersive audio-visual rendering for mixed reality. However, acoustic simulation has been underexplored for relevant virtual and real world audio processing applications. The main challenges in adopting accurate acoustic simulation methods include the tradeoff between accuracy and time-space cost, and the difficulties in acquiring and reconstructing acoustic scenes in the real world.

In this dissertation, we propose novel methods to overcome above challenges by leveraging the inferential power of deep neural networks, and combining them with interactive acoustic simulation techniques. First, we develop a neural network model that can learn the acoustic scattering field of different objects given their 3D representation as the input. This works facilitates the inclusion of wave acoustic scattering effects in interactive sound rendering applications, which used to be difficult without intensive pre-computation. Second, we incorporate a deep acoustic analysis neural network into the sound rendering pipeline to allow the generation of sounds that are perceptually consistent with real-world sounds. This is achieved by predicting acoustic parameters at run-time from real-world audio samples and optimizing simulation parameters accordingly. Finally, we build a pipeline that utilizes general 3D indoor scene datasets to generate high-quality acoustic room impulse responses, and demonstrate the usefulness of the generated data on several practical speech processing tasks. Our results demonstrate that by leveraging state-of-the-art physics-based acoustic simulation and deep learning techniques, realistic simulated data can be generated to enhance the sound rendering quality in the virtual world and boost the performance of audio processing tasks in the real world.

Examining Committee:

Chair:
Dean's Representative:
Members:

Dr. Dinesh Manocha
Dr. Carol Espy-Wilson
Dr. Ming C. Lin
Dr. Ramani Duraiswami
Dr. Nirupam Roy

Bio

Zhenyu Tang is a PhD student in the Department of Computer Science, also a member of the GAMMA research group led by Professor Dinesh Manocha and Ming Lin. His research interests span computer graphics and audio-visual computing, with a focus on enhancing virtual/augmented reality experience using physics-based simulation and learning-based methods. He received his Bachelor’s degree (with Honor) from Zhejiang University in 2017.

This talk is organized by Tom Hurst