Talks

PhD Proposal: Algorithms and Tools for Audio Scene Analysis

Adam O’Donovan - University of Maryland, College Park

A.V. Williams Building (AVW)

Monday, July 30, 2012, 2:00-3:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

THE PRELIMINARY ORAL EXAMINATION FOR THE DEGREE OF Ph.D. IN

COMPUTER SCIENCE FOR

Adam O’Donovan

Humans have an innate ability to understand the world around them using a multitude of modalities including visual, auditory, olfactory (smell), gustatory (taste), and haptic (touch) systems. Arguably, to understand a scene, the most important of these modalities are the auditory and visual systems. Classically the visual system has been the main focus of research in the scene understanding with systems such as visual object identification, face recognition, and tracking being the subject of considerable investigation. Recently, however, analogous studies in acoustic scene understanding have gained significant attention. However, the highly diffractive nature of sound leads to high computational loads when processing multiple streams of audio for scene understanding, such as those derived from a microphone array. On the other hand, because many algorithms involved in the processing of acoustic information can exploit the parallel processing of multiple streams of data simultaneously, there is hope for real-time performance.

In this proposal, novel GPU accelerated algorithms and novel device implementations are presented for the analysis of a complex audiovisual scene. Our current and proposed contributions include:

1. Novel parallel algorithms for the real-time processing of data collected from microphone arrays developed on a GPU.

2. Scalable hardware architectures for the creation of large scale microphone/video camera arrays using standard PC interfaces.

3. Algorithms for the joint analysis of scenes using co-registered microphone array and video camera array data.

The goal of my research is to demonstrate that these algorithms can be well mapped onto the architecture of state of the art GPUs and processing traditionally thought to be prohibitively slow can now be realized in real-time. Additionally, we outline a scalable microphone array architecture that allows massively parallel microphone arrays to be created and connected to a PC through standard USB 2.0 connections. One such prototype, which we call the audio camera, is presented. Several experiments, including concert hall acoustic analysis, matched filter beamforming, and joint audio visual tracking, demonstrate the utility of this approach for scene understanding

Examining Committee:

Dr. Ramani Duraiswami - Chair

Dr. Larry S. Davis - Co-Chair

Dr. Neil Spring - Dept’s Representative

Dr. Dmitry Zotkin - Committee Member

This talk is organized by Jeff Foster