CVSS

Energy Minimization with Graph Cuts

2012-02-15T10:14:53-05:00

Sameh Khamis - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 16, 2012, 4:00-5:00 pm

Abstract:

In this tutorial we describe how several computer vision problems can be intuitively formulated as Markov Random Fields. Inference in such models can be transformed to an energy minimization problem. Under some conditions, graph cut methods can be used to find the minimum of the energy function and, in turn, the most probable assignment for its variables. In addition, we will briefly cover some of the recent advances in the application of graph cuts to a wider set of energy functions.

This talk is part of the following lists: CVSS

The Telluride Neuromorphic Workshop Experience

2012-03-08T11:15:14-05:00

Ching Lik Teo - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 2, 2012, 4:00-5:00 pm

Abstract:

In this talk, I will present what we did as a group at the Telluride Neuromorphic Workshop 2011. I will explain the challenges we faced, modules that we have used, and some results from experiments on activity description we have conducted on the robot.

This talk is part of the following lists: CVSS

A Complementary Local Feature Descriptor for Face Identification (CCS-POP)

2012-03-08T11:16:58-05:00

Jonghyun Choi - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 9, 2012, 4:00-5:00 pm

Abstract:

In many descriptors, spatial intensity transforms are often packed into a histogram or encoded into binary strings to be insensitive to local misalignment and compact. Discriminative information, however, might be lost during the process as a trade-off. To capture the lost pixel-wise local information, we propose a new feature descriptor, Circular Center Symmetric-Pairs of Pixels (CCS-POP). It concatenates the symmetric pixel differences centered at a pixel position along various orientations with various radii; it is a generalized form of Local Binary Patterns, its variants and Pairs-of-Pixels (POP). Combining CCS-POP with existing descriptors achieves better face identification performance on FRGC Ver. 1.0 and FERET datasets compared to state-of-the-art approaches.

This talk is part of the following lists: CVSS

Using Classifier Cascades for Scalable E-Mail Classification

2012-03-08T11:17:46-05:00

Jay Pujara - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 23, 2012, 4:00-5:00 pm

Abstract:

In many real-world scenarios, we must make judgments in the presence of computational constraints. One common computational constraint arises when the features used to make a judgment each have differing acquisition costs, but there is a fixed total budget for a set of judgments. Particularly when there are a large number of classifications that must be made in a real-time, an intelligent strategy for optimizing accuracy versus computational costs is essential. E-mail classification is an area where accurate and timely results require such a trade-off. We identify two scenarios where intelligent feature acquisition can improve classifier performance. In granular classification we seek to classify e-mails with increasingly specific labels structured in a hierarchy, where each level of the hierarchy requires a different trade-off between cost and accuracy. In load-sensitive classification, we classify a set of instances within an arbitrary total budget for acquiring features. Our method, Adaptive Classifier Cascades (ACC), designs a policy to combine a series of base classifiers with increasing computational costs given a desired trade-off between cost and accuracy. Using this method, we learn a relationship between feature costs and label hierarchies, for granular classification and cost budgets, for load-sensitive classification. We evaluate our method on real-world e-mail datasets with realistic estimates of feature acquisition cost, and we demonstrate superior results when compared to baseline classifiers that do not have a granular, cost-sensitive feature acquisition policy.

This talk is part of the following lists: CVSS

Example-Driven Manifold Priors for Image Deconvolution

2012-03-08T11:19:08-05:00

Jie Ni - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, March 8, 2012, 4:30-5:30 pm

Abstract:

Image restoration methods that exploit prior information about images to be estimated have been extensively studied, typically using the Bayesian framework. In this work, we consider the role of prior knowledge of the object class in the form of a patch manifold to address the deconvolution problem. Specifically, we incorporate unlabeled image data of the object class, say natural images, in the form of a patch-manifold prior for the object class. The manifold prior is implicitly estimated from the given unlabeled data. We show how the patch-manifold prior effectively exploits the available sample class data for regularizing the econvolution problem. Furthermore, we derive a generalized cross-validation (GCV) function to automatically determine the regularization parameter at each iteration without explicitly knowing the noise variance. Extensive experiments show that this method performs better than many competitive image deconvolution methods.

This talk is part of the following lists: CVSS

Covariance Discriminative Learning: A Natural and Efficient Approach to Image Set Classification

2012-03-13T19:45:02-04:00

Huimin Guo - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, March 15, 2012, 4:30-5:30 pm

Abstract:

We introduce a novel discriminative learning approach to image set classification by modeling the image set with its natural second order statistic, i.e., covariance matrix. Since nonsingular covariance matrices, a.k.a. symmetric positive definite (SPD) matrices, lie on a Riemannian manifold, classical learning algorithms cannot be directly utilized to classify points on the manifold. By exploring an efficient metric for the SPD matrices, i.e., Log-Euclidean Distance (LED), we derive a kernel function that explicitly maps the covariance matrix from the Riemannian manifold to a Euclidean space. With this explicit mapping, any learning method devoted to vector space can be exploited in either linear or kernel formulation. Linear Discriminant Analysis (LDA) and Partial Least Squares (PLS) are considered in this paper for their feasibility for our specific problem. The proposed method is evaluated on two tasks: face recognition and object categorization. Extensive experimental results show not only the superiority of our method over state-of-the-art ones in both accuracy and efficiency, but also its stability to two real challenges: noisy set data and varying set size.

This talk is part of the following lists: CVSS

Group Norms for Learning Latent Structural SVMs

2012-03-26T23:05:46-04:00

Daozheng Chen - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, March 29, 2012, 4:30-5:30 pm

Abstract:

Latent variables models have been widely applied in many problems in machine learning and related fields such as computer vision and information retrieval.However, the complexity of the latent space in such models is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this work is to regularize the complexity of the latent space and learn which hidden states are really relevant for the prediction problem.To this end, we propose regularization with a group norm such as L1-L2 to estimate parameters of a Latent Structural SVM. Our experiments on digit recognition show that our approach is indeed able to control the complexity of latent space, resulting in significantly faster inference at test-time without any loss in accuracy of the learnt model.

This talk is part of the following lists: CVSS

Sparsity Inspired Unconstrained Iris Recognition

2012-04-03T10:21:09-04:00

Jaishanker Pillai - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 5, 2012, 4:30-5:30 pm

Abstract:

Iris recognition is one of the most popular approaches for human authentication, since the iris patterns are unique for each person and remain stable for long periods of time. However, existing algorithms for iris recognition require clean iris images, which limit their utility in unconstrained environments like surveillance. In this work, we develop an unconstrained iris recognition algorithm by modeling the inherent structure in clean iris images using sparse representations. The proposed algorithm recognizes the test image and also predicts the quality of acquisition. We further extend the introduced algorithm by a quality based fusion framework, which combine the recognition results from multiple test images. Extensive evaluation on existing datasets clearly demonstrate the utility of the proposed algorithm for recognition and image quality estimation.

This talk is part of the following lists: CVSS

Ambiguities in Camera Self-Calibration

2012-04-10T11:06:31-04:00

Jun-Cheng Chen - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 12, 2012, 4:30-5:30 pm

Abstract:

Structure from motion (SfM) is the problem of computing the 3D scene and camera parameters from a video or collection of images. SfM problems can be further classified as calibrated and uncalibrated. In calibrated SfM, the internal camera parameters are known. This is a much easier problem than the uncalibrated case, where these parameters are unknown. Solving for the internal camera parameters are known as the camera self/auto calibration problem. Critical motion sequences (CMS) are those sequences/videos from which internal parameters cannot be determined uniquely, that is, there are many different settings of internal parameters that give rise to the same video. In this talk, we are going to show that three cases of motions, (1) pure translation, (2) single rotation, and (3) single rotation about X/Y/Z-axis and translation, are CMS, and the necessary and sufficient conditions of a sequence not being a CMS.

This talk is part of the following lists: CVSS

Facial Expression Analysis System

2012-04-16T15:30:29-04:00

Sima Taheri - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 19, 2012, 4:30-5:30 pm

Abstract:

The goal of facial expression analysis is to create systems that can automatically analyze and recognize facial feature changes and facial motion due to facial expressions from visual information. This has been an active research topic for several years and has attracted the interest of many computer vision researchers and behavioral scientists, with applications in behavioral science, security, animation, and human-computer interaction. In this talk, I will briefly describe the components of a facial expression analysis system and review some previous work. Then I will talk about my work, View-Invariant Expression Analysis using Analytic Shape Manifolds and Structure-Preserving Sparse Decomposition for Facial Expression Analysis.

This talk is part of the following lists: CVSS

Rendering massive virtual world using Clipmaps

2012-04-24T11:22:46-04:00

Sujal Bista - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 26, 2012, 4:30-5:30 pm

Abstract:

Real time rendering of a massive virtual world requires efficient management of textures, geometric structures, and a variety of visual effects. Despite the recent improvements of the Graphics Processing Units (GPU), the currently available memory space and computation power is still not enough to store and process textures and geometries that are used to represent a high-quality virtual world. One way to overcome this problem is use Clipmap, which is a hardware accelerated approach that manages levels of detail (LOD) for objects, textures, and effects used to render a virtual world.

This talk is part of the following lists: CVSS

Spatial Marked Point Processes in Computer Vision

2012-04-30T20:34:49-04:00

Nazre Batool - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, May 3, 2012, 4:30-5:30 pm

Abstract:

The computer vision community is very familiar with Markov random field (MRF) modeling for numerous applications. In this talk, I will present an overview of the more general, but less popular, Markov point processes (MPP) and will highlight the connection between MRF and MPP and the advantages of MPP modeling for specific applications. Recently, several versions of MPP called ‘Marked’ point processes have been used in remote sensing/aerial imaging applications. I will discuss some of the applications and finally, present my recent work based on MPP.

This talk is part of the following lists: CVSS

Using Google Street View to Identify Street-level Accessibility Problems

2012-09-27T11:03:32-04:00

Kotaro Hara - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 11, 2012, 4:30-5:30 pm

Abstract:

Poorly maintained sidewalks, missing curb ramps, and other obstacles pose considerable accessibility challenges; however, there are currently few, if any, mechanisms to determine accessible areas of a city a priori.

In the first half of the presentation, I will talk about our investigation of the feasibility of using untrained crowd workers from Amazon Mechanical Turk (turkers) to find, label, and assess sidewalk accessibility problems in Google Street View imagery. Our work effectively demonstrates a promising new, highly scalable method for acquiring knowledge about sidewalk accessibility.

In the latter half, I will discuss the future works as well as open research questions related in the field of computer vision.

This talk is part of the following lists: CVSS

Face Alignment by Explicit Shape Regression

2012-09-03T10:19:37-04:00

Angjoo Kanazawa - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, September 6, 2012, 4:30-5:30 pm

Abstract:

In this talk, we will go over CVPR 2012 paper "Face Alignment by Explicit Shape Regression". I will review the paper and discuss its key concepts: cascaded regression, random ferns, shape indexed image features, and correlation based feature selection. Then I will discuss our hypothesis on why this seemingly simple method works so well and how we can apply their method to similar problem domains such as dog and bird parts localization and their challenges.

Abstract from the paper:

We present a very ef?cient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to ?ne during the test, without using a ?xed parametric shape model as in most previous methods. To make the regression more effective and ef?cient, we design a two-level boosted regression, shape-indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 minutes for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach signi?cantly outperforms the state-of-the-art in terms of both accuracy and ef?ciency.

This talk is part of the following lists: CVSS

Combining Per-Frame and Per-Track Cues for Multi-Person Action Recognition

2012-09-03T10:33:20-04:00

Sameh Khamis - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, September 13, 2012, 4:30-5:30 pm

Abstract:

We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level (e.g., a person jogging then dancing) and the frame level (e.g., a person jogging in a dance studio). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.

This talk is part of the following lists: CVSS

Artificial Intelligence and Artificial Creativity Before 1900

2012-09-10T13:00:45-04:00

Doug Summers-Stay - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, September 20, 2012, 4:30-5:30 pm

Abstract:

I will talk about various inventions such as the Eureka, which generated Latin poetry in hexameter while playing "God Save the Queen"; the Homeoscope, a mechanical search engine invented by a Russian police clerk in 1832; the Componium, an orchestra-in-a-box which composed random variations on a melody; and others along the same lines. I'll also talk about how we could go beyond these techniques to build something really creative. This is a presentation of material I found when I was doing research for the book I published in January, Machinamenta.

This talk is part of the following lists: CVSS

Attribute Discovery via Predictable and Discriminative Binary Codes

2012-09-13T11:41:11-04:00

Mohammad Rastegari - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, September 27, 2012, 4:30-5:30 pm

Abstract:

We present images with binary codes in a way that balances discrimination and learnability of the codes. In our method, each image claims its own code in a way that maintains discrimination while being predictable from visual data. Category memberships are usually good proxies for visual similarity but should not be enforced as a hard constraint. Our method learns codes that maximize separability of categories unless there is strong visual evidence against it. Simple linear SVMs can achieve state-of-the-art results with our short codes. In fact, our method produces state-of-the-art results on Caltech256 with only 128- dimensional bit vectors and outperforms state of the art by using longer codes. We also evaluate our method on ImageNet and show that our method outperforms state-of-the-art binary code methods on this large scale dataset. Lastly, our codes can discover a discriminative set of attributes.

This talk is part of the following lists: CVSS

Anomaly Detection on Railway Components using Sparse Representations

2012-09-14T10:42:18-04:00

Xavier Gibert-Serra - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 4, 2012, 4:30-5:30 pm

Abstract:

High-speed rail (HSR) requires high levels of reliability of the track infrastructure. Automated visual inspection is useful for finding many anomalies such as cracks or chips on joint bars and concrete ties, but existing vision-based inspection systems often produce high number of false detections, and are very sensitive to external factors such as changes in environmental conditions. For example, state-of-the-art algorithms used by the railroad industry nominally perform at a detection rate of 85% with a false alarm rate of 3% and performance drops very quickly as image quality degrades. On the tie inspection problem, this false alarm rate would correspond to 2.6 detections per second at 125 MPH, which cannot be handled by an operator. These false detections have many causes, including variations in anomaly appearance, texture, partial occlusion, and noise, which existing algorithms cannot handle very well. To overcome these limitations, it is necessary to reformulate this joint detection and segmentation problem as a Blind Source Separation problem, and use a generative model that is robust to noise and is capable of handling missing data.

In signal and image processing, Sparse Representations (SR) is an efficient way of describing a signal as a linear combination of a small number of atoms (elementary signals) from a dictionary. In natural images, sparsity arises from the statistical dependencies of pixel values across the image. Therefore, statistical methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA) have been used for dimensionality reduction in several computer vision problems. Recent advances in SR theory have enabled methods that learn optimal dictionaries directly from training data. For example, K-SVD is a very well known algorithm for automatically designing over-complete dictionaries for sparse representation.

In this detection problem, the anomalies have very well defined structure and therefore, they can be represented sparsely in some subspace. In addition, the image background has very structured texture, so it is sparse with respect to a different frame. Theoretical results in mathematical geometric separation show that it is possible to separate these two image components (regular texture from contours) by minimizing the L1 norm the coefficients in geometrically complementary frames. More recently, it has been shown that this problem can be solved efficiently using thresholding and total variation regularization. Our experiments show that the sparse coefficients extracted from the contour component can be converted into feature vectors that can be used to cluster and detect these anomalies.

This talk is part of the following lists: CVSS

Dictionary-based Face Recognition from Video

2012-10-02T09:29:26-04:00

Yi-Chen Chen - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 25, 2012, 4:30-5:30 pm

Abstract:

The main challenge in recognizing faces in video is effectively exploiting the multiple frames of a face and the accompanying dynamic signature. One prominent method is based on extracting joint appearance and behavioral features. A second method models a person by temporal correlations of features in a video. Our approach introduces the concept of video-dictionaries for face recognition, which generalizes the work in sparse representation and dictionaries for faces in still images. Video-dictionaries are designed to implicitly encode temporal, pose, and illumination information. We demonstrate our method on the Face and Ocular Challenge Series (FOCS), which consists of unconstrained video sequences. We show that our method is efficient and performs significantly better than many competitive video-based face recognition algorithms.

This talk is part of the following lists: CVSS

Dictionary learning methods for computer vision

2012-10-09T12:23:56-04:00

Ashish Shrivastava - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 18, 2012, 4:30-5:30 pm

Abstract:

Sparse and redundant signal representations have recently gained much interest in image understanding. This is partly due to the fact that signals or images of interest are often sparse in some dictionary. These dictionaries can be either analytic or they can be learned directly from the data. In fact, it has been observed that learning a dictionary directly from data often leads to improved results in many practical applications such as classification and restoration. In this talk I will give a general overview of dictionary learning methods and talk in detail about my recent work on semi-supervised dictionary learning and non-linear supervised dictionary learning methods.

This talk is part of the following lists: CVSS

Instance Level Multiple Instance Learning Using Similarity Preserving Quasi Cliques

2012-10-25T16:12:03-04:00

Mohammad Rastegari - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, November 1, 2012, 4:30-5:30 pm

Abstract:

In this work we introduce an instance-level approach to multiple instance learning. Our bottom-up approach learns a discriminative notion of similarity between instances in positive bags and use it to form a discriminative similarity graph. We then introduce the notion of similarity preserving quasi-cliques that aims at discovering large quasi-cliques with high scores of within-clique similarities. We argue that such large cliques provide clue to infer the underlying structure between positive instances. We use a ranking function that takes into account pairwise similarities coupled with prospectiveness of edges to score all positive instances. We show that these scores yield to positive instance discovery. Our experimental evaluations show that our method outperforms state-of-the-art MIL methods both at the bag-level and instance-level predictions in standard benchmarks and image and text datasets.

This talk is part of the following lists: CVSS

Knowledge Adaptation in Visual Domains

2012-11-27T08:43:36-05:00

Fatemeh Mirrashed - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, November 29, 2012, 4:30-5:30 pm

Abstract:

The new machine learning techniques of transfer learning and domain adaptation have recently captured special attention in the computer vision community. In this talk we will take a look at some of the methods that have been recently adopted or developed for adaptation of learning in the visual domains. We will also try to have an open discussion over some of more ideological questions such as better generalization versus adaptation. With abundance of massive volumes of visual training data should we keep at designing algorithms that could model all the possible variations in the visual world or should we regard adaptation as an integral part of learning in the visual domains?

This talk is part of the following lists: CVSS

Clustering Images with Algorithms and Humans

2012-11-27T14:12:10-05:00

Arijit Biswas - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, December 13, 2012, 4:30-5:30 pm

Abstract:

In this talk, we address the problem of image clustering. Although there are many excellent clustering algorithms, effective clustering remains very challenging for datasets that contain many classes. We propose several approaches to cluster images accurately. First, we use pairwise constraints from humans to cluster images. An algorithm provides us with pairwise image similarities. We then actively obtain selected, more accurate pairwise similarities from humans. A novel method is developed to choose the most useful pairs to show a person, obtaining constraints that improve clustering. Second, we propose a new algorithm to cluster a subset of the images only (we call this subclustering), which will produce a few examples from each class. Subclustering will produce smaller but purer clusters. Finally, we make use of human input in an active subclustering algorithm to further improve results.

This talk is part of the following lists: CVSS

Linear Dimensionality Reduction Methods for Object Detection

2012-11-29T13:39:09-05:00

Ejaz Ahmed - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, December 6, 2012, 4:30-5:30 pm

Abstract:

I'll talk about linear methods for dimensionality reduction, emphasizing on PLS and CDF (composite discriminant factors). Focus on linear methods is motivated by the task of object detection which can be benefited from linear projections in various ways. I'll go through some of these issues as well. We propose a linear dimensionality reduction method, Composite Discriminant Factor (CDF) analysis, which searches for a discriminative but compact feature subspace that can be used as input to classifiers that suffer from problems such as multi-collinearity or the curse of dimensionality. The subspace selected by CDF maximizes the performance of the entire classification pipeline, and is chosen from a set of candidate subspaces that are each discriminative by various local measures, such as covariance between input features and output labels or the margin between positive and negative samples. Our method is based on Partial Least Squares (PLS) analysis, and can be viewed as a generalization of the PLS1 algorithm, designed to increase discrimination in classification tasks. While our experiments focus on improvements to object detection (in particular, pedestrians and vehicles), a task that often involves high dimensional features and benefits from fast linear approaches, we also demonstrate our approach on machine learning datasets from the UCI Machine Learning repository. Experimental results show that the proposed approach improves significantly over SVM in terms of accuracy and also over PLS in terms of compactness and efficiency, while maintaining or slightly improving accuracy.

This talk is part of the following lists: CVSS

Probabilistic Event Calculus based on Markov Logic Networks

2013-07-01T11:58:08-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Graphical Models

2013-07-01T11:59:01-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Coupling Detection and Data Association for Multiple Object Tracking

2013-07-01T11:59:43-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Structured Learning and Prediction in Computer Vision

2013-07-01T12:01:27-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

2013-07-01T12:02:14-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Gradient-Based Learning Applied to Document Recognition

2013-07-01T12:02:47-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Learning Deep Architectures for AI

2013-07-01T12:03:25-04:00

(No abstract yet)
This talk is part of the following lists: CVSS

Fast Image Prior

2013-09-17T16:27:01-04:00

Mohammad Rastegari - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, September 19, 2013, 4:30-5:30 pm

Abstract:

In this project we introduce a new method for learning image prior that can be used for many applications in image reconstruction. We learn a generative model on natural image patches. Our generative model is similar to one in Gausian Mixture Model (GMM). The key idea of our approach is to force each component of our generative model to share the same set of basis vectors. This leads to a much faster inference at test time. We used image denoising as our test bed for this image prior learning. Our experimental results shows that we reached about 30x speed up over state-of-the-art method while getting slightly improvement in denoising accuracy.

This talk is part of the following lists: CVSS

A Sentence is Worth a Thousand Pixels

2013-09-30T13:59:50-04:00

Abhishek Sharma - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 24, 2013, 4:30-5:30 pm

Abstract:

We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models.

This talk is part of the following lists: CVSS

A Context-free Manipulation Action Grammar and Manipulation Action Consequences Detection

2013-10-08T15:54:19-04:00

Yezhou Yang - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 10, 2013, 4:30-5:30 pm

Abstract:

Humanoid robots will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. In this presentation I will introduce a manipulation grammar to perform this learning task. Context-free grammars in linguistics provide a simple and precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks. Also, the basic recursive structure of natural languages is described exactly. Similarly, for manipulation actions, every complex activity is built from smaller blocks involving hands and their movements, as well as objects, tools and the monitoring of their state. Thus, interpreting a seen action is like understanding language, and executing an action from knowledge in memory is like producing language. Associated with the grammar, a parsing algorithm is proposed, which can be used bottom-up to interpret videos by dynamically creating a semantic tree structure, and top-down to create the motor commands for a robot to execute manipulation actions. Experiments on both tasks, i.e. a robot observing people performing manipulation actions, and a robot executing manipulation actions on a simulation platform, validate the proposed formalism.

This talk is part of the following lists: CVSS

Ray Saliency: Bottom-Up Saliency for a Rotating and Zooming Camera

2013-10-14T18:00:55-04:00

Garrett Warnell - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, October 17, 2013, 4:30-5:30 pm

Abstract:

We extend the classical notion of visual saliency to multi-image data collected using a stationary pan-tilt-zoom (PTZ) camera. We show why existing saliency methods are not effective for this type of data, and propose ray saliency: a modified notion of visual saliency that utilizes knowledge of the imaging process in order to appropriately incorporate the context provided by multiple images. We present a practical, mosaic-free method by which to quantify and calculate ray saliency, and demonstrate its usefulness on PTZ imagery.

This talk is part of the following lists: CVSS

Cross-View Action Recognition Via a Transferable Dictionary Pair

2013-11-05T16:37:12-05:00

Jingjing Zheng - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, November 7, 2013, 4:30-5:30 pm

Abstract:

Discriminative appearance features are effective for recognizing actions in a fixed view, but generalize poorly to changes in viewpoint. We present a method for view-invariant action recognition based on sparse representations using a transferable dictionary pair. A transferable dictionary pair consists of two dictionaries that correspond to the source and target views respectively. The two dictionaries are learned simultaneously from pairs of videos taken at different views and aim to encourage each video in the pair to have the same sparse representation. Thus, the transferable dictionary pair links features between the two views that are useful for action recognition. Both unsupervised and supervised algorithms are presented for learning transferable dictionary pairs. Using the sparse representation as features, a classifier built in the source view can be directly transferred to the target view. We extend our approach to transferring an action model learned from multiple source views to one target view. We demonstrate the effectiveness of our approach on the multi-view IXMAS data set. Our results compare favorably to the the state of the art.

This talk is part of the following lists: CVSS

Joint Sparse Representation for Multimodal Biometric Recognition

2013-11-12T21:39:31-05:00

Sumit Shekhar - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, November 14, 2013, 4:30-5:30 pm

Abstract:

In this talk, I will present the work on feature-level fusion method for multimodal biometric recognition. Traditional methods for combining outputs from different modalities are based on score-level or decision-level fusion. Feature-level fusion can be more discriminative, but has hardly been explored due to challenges of different feature outputs and high feature dimensions. Here, I will present a framework using joint sparsity to combine information, and show its application to multimodal biometric recognition, face recognition and vidoe-based recognition.

This talk is part of the following lists: CVSS

Renaissance of Convolutional Neural Network - what, why and so?

2013-11-20T17:58:38-05:00

Jonghyun Choi - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, November 21, 2013, 4:30-5:30 pm

Abstract:

The convolutional neural network based deep networks recently improve image classification accuracy significantly over the state-of-the-art vision approaches. I will go through what the successful deep convolutional neural net looks like, why it is again popular now and on-going deep net research in other research groups. I will mostly go through the successful instance of deep convolutional neural net tuned by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton, published in NIPS 2012.

This talk is part of the following lists: CVSS

Distance Learning Using the Triangle Inequality for Semi-supervised Clustering

2013-12-13T00:34:20-05:00

Arijit Biswas - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, December 5, 2013, 4:30-5:30 pm

Abstract:

Success of semi-supervised clustering algorithms depends on how effectively supervision can be propagated to the unsupervised data. We propose a method for modifying all pairwise image distances when must-link or can't-link pairwise constraints are provided for only a few image pairs. These distances are used for clustering images. First, we formulate a brute-force Quadratic Programming (QP) method that modifies the distances such that the total change in distances is minimized but the final distances obey the triangle inequality. Then we propose a much faster version of the QP that can be applied to large datasets by enforcing only a selected subset of the inequalities. We prove that this still ensures that key qualitative properties of the distances are correctly computed. We run experiments on face, leaf and video image clustering and show that our proposed approach outperforms state-of-the-art methods for constrained clustering.

This talk is part of the following lists: CVSS

Scene and Video Understanding

2014-01-20T14:18:14-05:00

Arpit Jain - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, January 30, 2014, 3:30-4:30 pm

Abstract:

There has been significant improvements in the accuracy of scene understanding due to a shift from recognizing objects ``in isolation'' to context based recognition systems. Such systems improve recognition rates by augmenting appearance based models of individual objects with contextual information based on pairwise relationships between objects. These pairwise relations incorporate common world knowledge such as co-occurences and spatial arrangements of objects, scene layout, etc. However, these relations, even though consistent in 3D world, change due to viewpoint of the scene. In this thesis, we will look into the problems of incorporating contextual information from two different perspective for scene understanding problem (a) ``what'' contextual relations are useful and ``how'' they should be incorporated into Markov network during inference. (b) jointly solve the segmentation and recognition problem using a multiple segmentation framework based on contextual information in conjunction with appearance matching. In the later part of the thesis, we will investigate different representations for video understanding and propose a discriminative patch based representation for videos.

Our work depart from traditional view of incorporating context into scene understanding problem where a fixed model for context is learned. We argue that context is scene dependent and propose a data-driven approach to predict the importance of edges and construct a Markov network for image analysis based on statistical models of global and local image features. Since all contextual information are not equally important, we also address the coupled problem of predicting the feature weights associated with each edge of a Markov network for evaluation of context. We then address the problem of fixed segmentation while modelling context by using a multiple segmentation framework and formulating the problem as ``a jigsaw puzzle''. We formulate the problem as segment selection from a pool of segments (jigsaws), assigning each selected segment a class label. Previous multiple segmentation approaches used local appearance matching to select segments in a greedy manner. In contrast, our approach formulates a cost function based on contextual information in conjunction with appearance matching. This relaxed cost function formulation is minimized using an efficient quadratic programming solver and an approximate solution is obtained by discretizing the relaxed solution.

Lastly, we propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification.

This talk is part of the following lists: CVSS

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group

2014-01-31T10:24:44-05:00

Raviteja Vemulapalli - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 6, 2014, 3:30-4:30 pm

Abstract:

Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have resulted in a renewed interest in skeleton-based human action recognition. Most of the earlier skeleton-based approaches used either the joint locations or the joint angles to represent a human skeleton. In this paper, we propose a new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space. Since 3D rigid body motions are members of the special Euclidean group SE(3), the proposed skeletal representation lies in the Lie group SE(3)×. . .×SE(3), which is a curved manifold. With the proposed representation human actions can be modeled as curves in this Lie group. Since classification of curves in this Lie group is not an easy task, we map the action curves from the Lie group to its Lie algebra, which is a vector space. We then perform classification using a combination of dynamic time warping, Fourier temporal pyramid representation and linear SVM. Experimental results on three action datasets show that the proposed representation performs better than various other commonly-used skeletal representations. The proposed approach also outperforms various state-of-the-art skeleton-based human action recognition approaches.

This talk is part of the following lists: CVSS

Feedback Loop between High Level Semantics and Low Level Vision

2014-02-02T15:53:07-05:00

Varun Nagaraja - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 20, 2014, 3:30-4:30 pm

Abstract:

High level semantical analysis typically involves constructing a Markov network over detections from low level detectors to encode context and model relationships between them. In complex higher order networks (e.g. Markov Logic Networks), each detection can be part of many factors and the network size grows rapidly as a function of the number of detections. Hence to keep the network size small, a threshold is applied on the confidence measures of the detections to discard the less likely detections. A practical challenge is to decide what thresholds to use to discard noisy detections. A high threshold will lead to a high false dismissal rate. A low threshold can result in many detections including mostly noisy ones which leads to a large network size and increased computational requirements.

We propose a feedback based incremental technique to tackle this problem, where we initialize the network with high confidence detections and then based on the high level semantics in the initial network, we can incrementally select the relevant missing low level detections. We show three different ways of selecting detections which are based on three scoring functions that bound the increase in the optimal value of the objective function of network, with varying degrees of accuracy and computational cost. We perform experiments with an event recognition task in one-on-one basketball videos that uses Markov Logic Networks.

This talk is part of the following lists: CVSS

Predictable Dual View Hashing and Domain Adaptive Classification

2014-02-02T15:53:25-05:00

Mohammad Rastegari - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, February 27, 2014, 3:30-4:30 pm

Abstract:

We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces. We create a cross-view hamming space with the ability to compare information from previously incomparable domains with a notion of 'predictability'. By performing comparative experimental analysis on two large datasets, PASCAL-Sentence and SUN-Attribute, we demonstrate the superiority of our method to the state-of-the-art dual-view binary code learning algorithms. We also propose an unsupervised domain adaptation method that exploits intrinsic compact structures of categories across different domains using binary attributes. Our method directly optimizes for classification in the target domain. The key insight is finding attributes that are discriminative across categories and predictable across domains.

This talk is part of the following lists: CVSS

Anomaly Detection on Outdoor Images Using Sparse Representations

2014-02-02T15:54:04-05:00

Xavier Gibert Serra - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, March 13, 2014, 3:30-4:30 pm

Abstract:

The integrity of safety-critical infrastructure, such as railway tracks, roads, or bridges needs to be monitored regularly to prevent catastrophic failures. For example, federal regulations require visual inspection of all high speed tracks twice each week. Traditional manual inspection methods are time-consuming and prone to human error. With the availability of high-speed cameras, it is possible to survey large areas in less time. However, detecting cracks and other anomalies on these images is a particularly challenging problem because of the uncontrolled environment arising from differences in material composition, and superficial degradation caused by outdoor elements. Due to speed requirements, images acquired from a moving vehicle have limited resolution, causing the smallest of these cracks to be under-sampled in the transversal dimension. Therefore, these cracks get mixed with background texture, resulting in negative signal-to-noise ratio. State-of-the art methods are based on linear filters, which are only optimal under additive Gaussian noise assumptions. This problem of simultaneous detection and clustering of anomalies in textured images can be posed as a blind source separation problem, and by exploiting the mutual incoherence of the dictionaries of shearlets and isotropic wavelets, which sparsely represent cracks and texture, we can separate each component using an iterative shrinkage algorithm. In this talk, I will present an integrated framework for image separation, feature extraction, clustering and classification that takes advantage of this decomposition.

This talk is part of the following lists: CVSS

Estimating 3D Face Models

2014-02-02T15:54:48-05:00

Swaminathan Sankaranarayanan - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, March 27, 2014, 3:30-4:30 pm

Abstract:

In this talk, I will focus on the topic of 3D Face Model Estimation from Single Grayscale Images. This problem is usually formulated as a Shape from Shading problem involving assumptions about the Image Formation and the Illumination framework. I will review some of the state-of-art methods that attempt to solve this problem by using knowledge from existing 3D shape models of face images. I will the introduce the idea of using Sparse Depth Representations and motivate my method of formulating the Model Estimation problem as a Bilevel Sparse Coding Optimization. I will conclude my talk by explaining the algorithm that is used to solve the objective function and the issues that I am facing with it.

This talk is part of the following lists: CVSS

Affordance of Object Parts from Geometric Features

2014-02-02T15:55:11-05:00

Austin Myers - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 3, 2014, 3:30-4:30 pm

Abstract:

Understanding affordance is a first step to a deeper understanding of the world, one in which a robot knows how an object and its parts can be used. To assist in everyday activities, robots must not only be able to recognize a tool, but also localize the its parts and identify how each part is used. We propose a preliminary approach to jointly localize and identify the function, or affordances, of a tool’s parts for objects from known or completely novel categories. We combine superpixel segmentation, feature learning, and conditional random fields to provide precise 3D predictions of functional parts that can be used directly by a robot to interact with the world. To investigate this problem, we introduce a new RGB-D Part Affordance Dataset consisting of 105 kitchen, workshop, and garden tools with pixel-level affordance labels for over 10,000 RGB-D images. We analyze the effectiveness of different feature types, and show that geometric features are most important for successful affordance identification. We demonstrate that by identifying the affordances of tools at the level of parts, we can generalize to novel object categories and identify the useful parts of never before seen tools.

This talk is part of the following lists: CVSS

Tag Taxonomy Aware Dictionary Learning for Region Tagging

2014-02-02T15:55:26-05:00

Jingjing Zheng - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 10, 2014, 3:30-4:30 pm

Abstract:

Tags of image regions are often arranged in a hierarchical taxonomy based on their semantic meanings. Using the given tag taxonomy, we propose to jointly learn multi-layer hierarchical dictionaries and corresponding linear classifiers for region tagging. Specifically, we generate a node-specific dictionary for each tag node in the taxonomy, and then concatenate the node-specific dictionaries from each level to construct a level-specific dictionary. The hierarchical semantic structure among tags is preserved in the relationship among node-dictionaries. Simultaneously, the sparse codes obtained using the level-specific dictionaries are summed up as the final feature representation to design a linear classifier. Our approach not only makes use of sparse codes obtained from higher levels to help learn the classifiers for lower levels, but also encourages the tag nodes from lower levels that have the same parent tag node to implicitly share sparse codes obtained from higher levels. Experimental results using three benchmark datasets show that the proposed approach yields the best performance over recently proposed methods.

This talk is part of the following lists: CVSS

Semantic Object Selection

2014-02-02T15:56:05-05:00

Ejaz Ahmed - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, April 24, 2014, 3:30-4:30 pm

Abstract:

Interactive object segmentation has great practical importance in computer vision. Many interactive methods have been proposed utilizing user input in the form of mouse clicks and mouse strokes, and often requiring a lot of user intervention. In this paper, we present a system with a far simpler input method: the user needs only give the name of the desired object. With the tag provided by the user we do a text query of an image database to gather exemplars of the object. Using object proposals and borrowing ideas from image retrieval and object detection, the object is localized in the target image. An appearance model generated from the exemplars and the location prior are used in an energy minimization framework to select the object. Our method outperforms the state-of-the-art on existing datasets and on a more challenging dataset we collected.

This talk is part of the following lists: CVSS

Two-Dimensional Phase Unwrapping for Interferometric Synthetic Aperture Radar

2014-02-02T15:56:40-05:00

Garrett Warnell - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, May 8, 2014, 3:30-4:30 pm

Abstract:

In this talk, I will focus on the topic of two-dimensional phase unwrapping for interferometric synthetic aperture radar (InSAR). I will give an overview of InSAR and the phase unwrapping problem, and review several classes of methods that have been proposed to solve it. I will then discuss the relationship between this problem and the common computer vision problem of depth inference from gradients. I'll conclude by discussing my ongoing work that formulates phase unwrapping as a sparse error correction problem.

This talk is part of the following lists: CVSS

Sparse methods for robust and efficient recognition

2014-02-02T15:56:54-05:00

Sumit Shekhar - University of Maryland
3450 A.V. Williams Building (AVW)
Thursday, May 15, 2014, 3:30-4:30 pm

Abstract:

In this talk, I will talk of two problems in visual recognition. In the first part, I will talk about the problem of low resolution face recognition. This problem can happen in many scenarios like surveillance where the probe images are low resolution, but a high resolution gallery image is available. I will describe a synthesis based approach for low resolution recognition and demonstrate results on different face datasets. In the second part, I will describe a new analysis framework for sparse coding which has recently started getting attention. I will describe its application to various recognition problems and also demonstrate that its more efficient than standard sparse coding framework.

This talk is part of the following lists: CVSS

Overview of Amazon Echo and Amazon’s speech group.

2015-11-11T10:12:21-05:00

Shiv Vitaladevuni - Amazon
3450 A.V. Williams Building (AVW)
Friday, November 13, 2015, 1:00-2:00 pm

Abstract:

Amazon Echo was launched a year back as a voice enabled device that responds to a variety of user commands and requests. One of Echo's distinguishing features is that it is a far field speech device, i.e. users can talk to it from a distance, hands free, eyes free. We will present some of the key scientific challenges in developing this device and an overview of the research being done at Amazon's Speech group.

Bio:

Shiv Vitaladevuni is currently a Machine Learning Manager in the Amazon Echo Speech group at Cambridge, MA. He is an alumnus of the dept. and completed his Ph.D in 2007 under Prof. Larry Davis. He previously held Research Scientist positions at Raytheon BBN Technologies and Howard Hughes Medical Institute. He has experience working in many fields that include bioinformatics, computer vision and speech recognition.

This talk is part of the following lists: CVSS

Learning 3D Deformation of Animals from 2D Images

2015-12-03T18:42:23-05:00

Angjoo Kanazawa - Computer Vision Lab
3450 A.V. Williams Building (AVW)
Thursday, December 3, 2015, 4:15-5:00 pm

Abstract:

Understanding how an animal can deform and articulate is essential for a realistic modification of its 3D model. In this paper, we show that such information can be learned from user-clicked 2D images and a template 3D model of the target animal. We present a volumetric deformation framework that produces a set of new 3D models by deforming a template 3D model according to a set of user-clicked images. Our framework is based on a novel locally-bounded deformation energy, where every local region has its own stiffness value that bounds how much distortion is allowed at that location. We jointly learn the local stiffness bounds as we deform the template 3D mesh to match each user-clicked image. We show that this seemingly complex task can be solved as a sequence of convex optimization problems. We demonstrate the effectiveness of our approach on cats and horses, which are highly deformable and articulated animals. Our framework produces new 3D models of animals that are significantly more plausible than methods without learned stiffness.

This talk is part of the following lists: CVSS

Automated Event Retrieval using Web Trained Detectors

2015-12-03T18:47:52-05:00

Xintong Han - Computer Vision Lab
3450 A.V. Williams Building (AVW)
Thursday, December 10, 2015, 3:30-4:30 pm

Abstract:

Complex event retrieval is a challenging research problem, especially when no training videos are available. An alternative to collecting training videos is to train a large semantic concept bank a priori. Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos. However, defining an exhaustive concept lexicon and pre-training it requires vast computational resources. Therefore, recent approaches automate concept discovery and training by leveraging large amounts of weakly annotated web data. Compact visually salient concepts are automatically obtained by the use of concept pairs or, more generally, n-grams. However, not all visually salient n-grams are necessarily useful for an event query - some combinations of concepts may be visually compact but irrelevant--and this drastically affects performance. We propose an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval. Pruning depends both on the query and on the specific video instance being evaluated. Our approach also addresses calibration and domain adaptation issues that arise when applying concept detectors to unseen videos. We demonstrate large improvements over other vision based systems on the TRECVID MED 13 dataset. The speaker will also introduce some recent work by his group to deal with this problem.

This talk is part of the following lists: CVSS

Accelerating Atmospheric Turbulence Simulation for Deep Learning Algorithms

2021-10-14T18:47:58-04:00

Stanley H. Chan - Purdue University
Zoom: https://umd.zoom.us/j/92528272976
Wednesday, October 20, 2021, 2:00-3:15 pm

Abstract:

Seeing through a turbulent atmosphere has been one of the biggest challenges for ground-to-ground long-range incoherent imaging systems. The literature is very rich that can be dated back to Andrey Kolmogorov in the late 40’s, followed by a series of major developments by David Fried, Robert Noll, among others, during the 60’s and 70’s. However, even though we have a much better understanding of the atmosphere today, there remains a gap from the optics theory to image processing algorithms. In particular, training a deep neural network requires an accurate physical forward model that can synthesize training data at a large scale. Traditional wave propagation simulators are not an option here because they are computationally too expensive --- a 256x256 gray scale image would take several minutes to simulate.

In this talk, I will discuss the lessons I learned over the past few years and present some of my own work. I will start by giving a brief introduction of the classical split-step propagation model that has been the backbone of many numerical wave simulators. Then I will present two new simulators my students and I invented at Purdue:

- Collapsed phase-over-aperture model (our first-generation simulator): The idea is to compress the propagation path into a single phase-screen where each pixel on the phase screen is modeled through a Zernike expansion over the aperture. To enable spatial correlations of the aberrations, we invented a mirroring technique that brings the multi-aperture angle-of-arrival model from the image plane to the object plane. With additional numerical inventions, we offer 20x speed-up compared to the traditional split-step propagation.

- Phase-to-Space transform (our second-generation simulator): The idea is to rewrite the spatially varying steps in the first-generation model by introducing a spatially invariant basis expansion. We overcome the difficulty of translating from the Zernike coefficients to the new basis coefficients via the phase-to-space transform we invented. Our phase-to-space transform is implemented via a shallow neural network. By combining with the collapsed model, we offer 1000x speed-up compared to the traditional split-step propagation.

As an image processing / computer vision person, I will explain the turbulence physics using the language we are familiar with. I will discuss the potential benefits of the new simulators for future deep learning algorithms on this topic.

Bio:

Stanley H. Chan is the Elmore Associate Professor of Electrical and Computer Engineering at Purdue. He received the BEng degree in Electrical Engineering from the University of Hong Kong in 2007 and the PhD in Electrical Engineering from University of California, San Diego in 2011. Upon graduation, he went to Harvard and did a postdoc in Electrical Engineering and Statistics. Dr. Chan does research in photon-limited imaging and imaging through atmospheric turbulence. He is an associate editor of IEEE Transactions on Computational Imaging, and a former associate editor of OSA Optics Express. Dr. Chan is very pleased to share his undergraduate textbook Introduction to Probability for Data Science (https://probability4datascience.com/), which is a free textbook to all students around the world. He welcomes your feedback.

This talk is part of the following lists: CVSS

Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets

2022-02-21T18:27:48-05:00

Alex Hanson - UMD
4105 Brendan Iribe Center for Computer Science and Engineering (IRB)
Wednesday, March 2, 2022, 1:00-1:30 pm

Abstract:

In attempting to mitigate racial bias in face recognition, one common practice is for researchers to racially balance the training dataset. In fact, this is the motivation behind two popular datasets in this space — RFW and FairFace. In this work, we test this assumption by training skewed subsets of RFW. Surprisingly, some heavily skewed subsets outperform their balanced counterparts in both accuracy and fairness across race.

This work received the best paper runner up award in the HTCV Workshop at ICCV in 2021.

Those who would like to join remotely can use the following Zoom link:

https://umd.zoom.us/j/94332767827?pwd=RElnZElhNjNualliaG5pMG9zUHBKQT09

Bio:

Alex Hanson is a 3rd year PhD student advised by Abhinav Shrivastava. He is funded by the NDSEG fellowship.

This talk is part of the following lists: CVSS

Neural Radiosity

2022-02-21T18:29:08-05:00

Saeed Hadadan - UMD
4105 Brendan Iribe Center for Computer Science and Engineering (IRB)
Wednesday, March 2, 2022, 1:30-2:00 pm

Abstract:

In this talk, I will present a method called "Neural Radiosity" that is a method of solving the rendering equation using a single neural network. This method, which is inspired by Radiosity techniques, is based on a generic method of solving Fredholm equations of the second kind using a neural network. Our network can be used for animation generation, or novel view synthesis and generally is a radiance function that satisfies the full solution of the rendering equation with global illumination.

Those who would like to join remotely can use the following Zoom link:

https://umd.zoom.us/j/94332767827?pwd=RElnZElhNjNualliaG5pMG9zUHBKQT09

Bio:

I am a third year PhD student at UMD CS department working with Prof. Matthias Zwicker. My main research interest is use of neural networks in Computer Graphics with a focus on physically-based rendering.

This talk is part of the following lists: CVSS

Computational Imaging with Multiply Scattered Photons

2022-04-10T20:25:31-04:00

Adithya Pediredla - Carnegie Mellon University
4105 Brendan Iribe Center for Computer Science and Engineering (IRB)
Wednesday, April 13, 2022, 1:00-2:00 pm

Abstract:

Computational imaging has advanced to a point where the next significant milestone is to image in the presence of multiply-scattered light. Though traditionally treated as noise, multiply-scattered light carries information that can enable previously impossible imaging capabilities, such as imaging around corners and deep inside tissue. The combinatorial complexity of multiply-scattered light transport makes it necessary to use increasingly complex imaging systems to make imaging with multiply-scattered light feasible; examples include time-of-flight, structured-light, and acousto-optic imaging systems. The combined complexity of physics and systems makes the optimization of imaging of multiply-scattered light a challenging, high-dimensional design problem.

In my research, I utilize graphics and physics-based rendering to explore this complex design space and create imaging systems that optimally sense multiply-scattered light. I will show two examples of this approach. First, I will discuss how to develop rendering tools for time-of-flight cameras, and how to use these tools to design and build optimal time-of-flight systems for non-line-of-sight imaging. Second, I will discuss how to simulate continuously-refractive radiative transfer and use such simulations to optimize acousto-optic systems for imaging inside tissue and other scattering media.

This tallk will be in person. Remote participants can join via https://umd.zoom.us/j/91943253641?pwd=R0NNbWM2UHJudkRScjlvMjNnSmhNUT09

Bio:

Adithya Pediredla is currently a project scientist at Carnegie Mellon University and will join Dartmouth College as an assistant professor in January 2023. His research interests span computational imaging, physics-based rendering, and their combined use for imaging with multiply-scattered light. He received his Ph.D. in 2019 from Rice University, where his thesis received the Ralph Budd best engineering thesis award. He received his Master's degree from the Indian Institute of Science, where he received the Prof. K. R. Kambati memorial gold medal, and an innovative student project award from the Indian National Academy of Engineering. He completed his undergraduate studies at the National Institute of Technology, Warangal, India, where he received Institute and N. Ramarao memorial gold medals for academic excellence.

This talk is part of the following lists: CVSS

Innovations in Computer Vision and Computational Imaging can Enable Fairness in Medical Devices

2022-09-30T16:18:15-04:00

Achuta Kadambi - UCLA
4105 Brendan Iribe Center for Computer Science and Engineering (IRB)
Monday, October 3, 2022, 1:00-2:00 pm

Abstract:

Today, billions of light-based medical sensors are used by hospitals to measure quantities like blood flow, temperature, oxygenation and more. Clinical decision-making is partially based on the measurements from these sensors - so it’s important that these sensors measure data robustly. Unfortunately, the accuracy of light-based devices varies across demographics. Just as a soap dispenser may not always work for those with dark skin, a light-based medical device has fundamental challenges at the light-matter interface, leading to degradations in signal-to-noise (SNR) ratio and measurement accuracy. To solve this problem, and make devices more inclusive and even more accurate (for everyone), we need to rethink the sensing process, e.g., what wavelengths are used, what computer algorithms are used, and how datasets are generated. The parameters involved in designing a medical device are complex and high-dimensional, and ultimately only one design can be built. Differentiable computer algorithms are developed that backpropagate gradients to sensing parameters, enabling “learning” of sensing configurations. Learning how to design sensors can be applied to a wide variety of equitable imaging systems that measure heart rate and blood volume (contact-free and wirelessly). Of course learning requires data, and real data is at a scarcity. We also discuss human digital twin data pipelines that model melanin content and the theoretical benefits of minority inclusion on dataset composition. The fusion of novel sensors, physically-based simulators, and AI pipelines lead to novel medical systems deployed at UCLA Hospital to reduce bias (against minority groups), while improving accuracy (for everyone). We close by discussing how the devices studied in this talk are only scratching the surface of biosensors that can be redesigned with increasing fairness and accuracy in mind.

Zoom link: https://umd.zoom.us/j/7316339020

Bio:

Achuta Kadambi received the PhD degree from MIT in 2018 and joined UCLA as Assistant Professor in EECS. He has received the NSF CAREER, DARPA Young Faculty Award, and Army Young Investigator Award. He is also the 2022 IEEE-HKN Outstanding Young Professional under 35, and has also been on the Forbes 30 under 30 list. Kadambi holds 19 issued US patents for inventions in computational imaging and has co-authored a textbook in Computational Imaging (MIT Press, available as free PDF).

This talk is part of the following lists: CVSS

When Compressive Imaging Met Deep Learning

2023-09-20T09:55:05-04:00

Adrian Stern - Ben-Gurion University
3137 Brendan Iribe Center for Computer Science and Engineering (IRB)
Wednesday, October 4, 2023, 1:30-2:30 pm

Abstract:

In this talk we will overview the interplay between Deep Learning (DL) and Compressive Sampling (CS). We will provide an overview of prominent DL-based CS reconstruction algorithms, with a specific emphasis on practical implementation considerations. We will also discuss joint sensing-and-reconstruction DL optimization approaches for various sensing matrix types. The effectiveness of this design will be demonstrated through face compressive imaging using only a few samples. Additionally, we will explore how CS can safeguard DL from adversarial attacks.

Bio:

Adrian Stern is a Professor at the School of ECE at Ben-Gurion University in Israel, where he serves as department head School Deputy Head for Research. Previously, he served as the head of the Electro-Optical Engineering Department. He has held visiting scholar and professor positions at MIT and UConn.

Dr. Stern has published over 200 technical articles in leading peer-reviewed journals and conference proceedings. His current research interests include Scientific Deep Learning, Compressive Imaging and optical Sensing, 3D imaging, hyperspectral imaging, remote Sensing, and deep learning security.

Dr. Stern is an elected Fellow of Optica (formerly Optical Society of America) and of SPIE. He chaired and co-chaired several SPIE and OSA conferences. He has been an editor for several journals and is the editor of the book Optical Compressive Imaging.

This talk is part of the following lists: CVSS

Computational Optics for 3D Display Design

2024-02-07T18:47:49-05:00

Aswin Sankaranarayanan - CMU
4105 Brendan Iribe Center for Computer Science and Engineering (IRB)
Monday, February 12, 2024, 12:00-1:00 pm

Abstract:

Digitization of reality is at the cusp of widespread adoption and 3D displays are at the forefront of enabling such extended reality systems. For an immersive experience, a 3D display must faithfully reproduce the visual cues pertaining to vergence, accommodation, occlusion, and motion parallax. I will talk about recent work on computational display designs that enable such features in near-eye displays.

Bio:

Aswin C. Sankaranarayanan is a professor in the ECE department at CMU, where he leads the Image Science Lab. His research interests are broadly in computational photography, signal processing and vision. His doctoral research was in the University of Maryland where his dissertation won the distinguished dissertation award from the ECE department in 2009. Aswin is the recipient of best paper awards at SIGGRAPH 2023, CVPR 2019 and at ICCP in 2021 and 2022, the Dean’s Early Career Fellowship, the Spira Teaching award, the NSF CAREER award, the Eta Kappa Nu Excellence in Teaching award, and the Herschel Rich Invention award from Rice University.

This talk is part of the following lists: CVSS

Asymptotically Free Sketching and Applications in Ridge Regression

2024-04-15T09:40:39-04:00

Daniel LeJeune - Stanford University
3137 Brendan Iribe Center for Computer Science and Engineering (IRB)
Tuesday, April 23, 2024, 1:00-2:00 pm

Abstract:

Classical results in sketching for dimensionality reduction in machine learning assert that, provided the sketch size is sufficiently large, the original unsketched solution is recovered at a fraction of the cost. However, in many practical settings, the sketch size may be smaller than needed for these guarantees. We provide a more general asymptotic result for sketched matrix inversion that holds for any sketch size and reveal that sketching is equivalent to adding ridge regularization. We prove our results for a broad class of asymptotically free sketches encompassing the spectral profiles of most sketches used in practice. We then determine the precise effect of sketching on the generalization error of ridge regression and show that the generalized cross-validation risk estimator is consistent for sketched ensembles, enabling the efficient evaluation of unsketched ridge regression risk using only sketched data.

https://umd.zoom.us/j/97962008923

Bio:

Daniel LeJeune is a postdoctoral scholar in the Department of Statistics at Stanford University. He completed his B.S. in Engineering at McNeese State University, M.S. in Electrical and Computer Engineering at University of Michigan, and Ph.D. in Electrical and Computer Engineering at Rice University, and also served as Associate Staff at MIT Lincoln Laboratory working on applications of machine learning to cybersecurity. He is a recipient of the Rising Stars Award from the 2024 Conference on Parsimony and Learning and is interested in the theoretical analysis of heuristics applied in machine learning.

This talk is part of the following lists: CVSS