Talks

PhD Defense: Learning of Dense Optical Flow, Motion and Depth from Sparse Event Cameras

Chengxi Ye

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Thursday, May 16, 2019, 4:00-6:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

With recent advances in the field of autonomous driving, autonomous agents need to safely navigate around humans or other moving objects in unconstrained, highly dynamic environments. In this thesis, we demonstrate the feasibility of reconstructing dense depth, optical flow and motion information from a neuromorphic imaging device, called the Dynamic Vision Sensor (DVS). The DVS only records sparse and asynchronous events when the changes of lighting occur at camera pixels. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only.

To tackle this problem of reconstructing dense information from sparse information, we introduce the Evenly-Cascaded convolutional Network (ECN), a bio-inspired multi-level, multi-resolution neural network architecture. The network features an evenly-shaped design, and utilization of both high and low level features.

With just 150k parameters, our self-supervised pipeline is able to surpass pipelines that are 100x larger. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation in outdoor scenes. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for real-time robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night.

We also extend our pipeline to dynamic indoor scenes with independent moving objects. In addition to camera egomotion and a dense depth map, the network utilizes a mixture model to segment and compute per-object 3D translational velocities for moving objects. For this indoor task we are able to train a shallow network with just 40k parameters, which computes qualitative depth and egomotion.

Our analysis of the training shows modern neural networks are trained on tangled signals. This tangling effect can be imagined as a blurring introduced both by nature and by the training process. We propose to untangle the data with network deconvolution. We notice significantly better convergence without using any standard normalization techniques, which suggests that deconvolution is a promising approach.

Examining Committee:

Chair: Dr. Yiannis Aloimonos
Dean's rep: Dr. Timothy K. Horiuchi
Members: Dr. Cornelia Fermüller
Dr. Ramani Duraiswami
Dr. Dinesh Manocha
Dr. James A. Yorke

This talk is organized by Tom Hurst