log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Learning of Dense Optical Flow, Motion and Depth from Sparse Event Cameras
Chengxi Ye
Thursday, May 16, 2019, 4:00-6:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
With recent advances in the field of autonomous driving, autonomous agents need to safely navigate around humans or other moving objects in unconstrained, highly dynamic environments. In this thesis, we demonstrate the feasibility of reconstructing dense depth, optical flow and motion information from a neuromorphic imaging device, called the Dynamic Vision Sensor (DVS). The DVS only records sparse and asynchronous events when the changes of lighting occur at camera pixels. Our work is the first monocular pipeline that generates dense depth and optical flow from sparse event data only.

To tackle this problem of reconstructing dense information from sparse information, we introduce the Evenly-Cascaded convolutional Network (ECN), a bio-inspired multi-level, multi-resolution neural network architecture. The network features an evenly-shaped design, and utilization of both high and low level features.

With just 150k parameters, our self-supervised pipeline is able to surpass pipelines that are 100x larger. We evaluate our pipeline on the MVSEC self driving dataset and present results for depth, optical flow and and egomotion estimation in outdoor scenes. Due to the lightweight design, the inference part of the network runs at 250 FPS on a single GPU, making the pipeline ready for real-time robotics applications. Our experiments demonstrate significant improvements upon previous works that used deep learning on event data, as well as the ability of our pipeline to perform well during both day and night.

We also extend our pipeline to dynamic indoor scenes with independent moving objects. In addition to camera egomotion and a dense depth map, the network utilizes a mixture model to segment and compute per-object 3D translational velocities for moving objects. For this indoor task we are able to train a shallow network with just 40k parameters, which computes qualitative depth and egomotion.

Our analysis of the training shows modern neural networks are trained on tangled signals. This tangling effect can be imagined as a blurring introduced both by nature and by the training process. We propose to untangle the data with network deconvolution. We notice significantly better convergence without using any standard normalization techniques, which suggests that deconvolution is a promising approach.
Examining Committee: 
                          Chair:               Dr. Yiannis Aloimonos
                          Dean's rep:      Dr. Timothy K. Horiuchi
                          Members:        Dr. Cornelia Fermüller
                                                    Dr. Ramani Duraiswami
                                                    Dr. Dinesh Manocha      
                                                    Dr. James A. Yorke     
This talk is organized by Tom Hurst