Models, code, and papers for "Yang Gao":

Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering

Aug 29, 2019
Pedro F. Proenca, Yang Gao

On-orbit proximity operations in space rendezvous, docking and debris removal require precise and robust 6D pose estimation under a wide range of lighting conditions and against highly textured background, i.e., the Earth. This paper investigates leveraging deep learning and photorealistic rendering for monocular pose estimation of known uncooperative spacecrafts. We first present a simulator built on Unreal Engine 4, named URSO, to generate labeled images of spacecrafts orbiting the Earth, which can be used to train and evaluate neural networks. Secondly, we propose a deep learning framework for pose estimation based on orientation soft classification, which allows modelling orientation ambiguity as a mixture of Gaussians. This framework was evaluated both on URSO datasets and the ESA pose estimation challenge. In this competition, our best model achieved 3rd place on the synthetic test set and 2nd place on the real test set. Moreover, our results show the impact of several architectural and training aspects, and we demonstrate qualitatively how models learned on URSO datasets can perform on real images from space.

* * Adding more related work and references 

  Click for Model/Code and Paper
Asymmetric Valleys: Beyond Sharp and Flat Local Minima

Apr 07, 2019
Haowei He, Gao Huang, Yang Yuan

Despite the non-convex nature of their loss functions, deep neural networks are known to generalize well when optimized with stochastic gradient descent (SGD). Recent work conjectures that SGD with proper configuration is able to find wide and flat local minima, which have been proposed to be associated with good generalization performance. In this paper, we observe that local minima of modern deep networks are more than being flat or sharp. Specifically, at a local minimum there exist many asymmetric directions such that the loss increases abruptly along one side, and slowly along the opposite side--we formally define such minima as asymmetric valleys. Under mild assumptions, we prove that for asymmetric valleys, a solution biased towards the flat side generalizes better than the exact minimizer. Further, we show that simply averaging the weights along the SGD trajectory gives rise to such biased solutions implicitly. This provides a theoretical explanation for the intriguing phenomenon observed by Izmailov et al. (2018). In addition, we empirically find that batch normalization (BN) appears to be a major cause for asymmetric valleys.

* submitted to ICML 2019 

  Click for Model/Code and Paper
Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry

Jul 05, 2018
Pedro F. Proença, Yang Gao

This paper presents CAPE, a method to extract planes and cylinder segments from organized point clouds, which processes 640x480 depth images on a single CPU core at an average of 300 Hz, by operating on a grid of planar cells. While, compared to state-of-the-art plane extraction, the latency of CAPE is more consistent and 4-10 times faster, depending on the scene, we also demonstrate empirically that applying CAPE to visual odometry can improve trajectory estimation on scenes made of cylindrical surfaces (e.g. tunnels), whereas using a plane extraction approach that is not curve-aware deteriorates performance on these scenes. To use these geometric primitives in visual odometry, we propose extending a probabilistic RGB-D odometry framework based on points, lines and planes to cylinder primitives. Following this framework, CAPE runs on fused depth maps and the parameters of cylinders are modelled probabilistically to account for uncertainty and weight accordingly the pose optimization residuals.

* Accepted to IROS 2018 

  Click for Model/Code and Paper
Probabilistic RGB-D Odometry based on Points, Lines and Planes Under Depth Uncertainty

Jan 18, 2018
Pedro F. Proenca, Yang Gao

This work proposes a robust visual odometry method for structured environments that combines point features with line and plane segments, extracted through an RGB-D camera. Noisy depth maps are processed by a probabilistic depth fusion framework based on Mixtures of Gaussians to denoise and derive the depth uncertainty, which is then propagated throughout the visual odometry pipeline. Probabilistic 3D plane and line fitting solutions are used to model the uncertainties of the feature parameters and pose is estimated by combining the three types of primitives based on their uncertainties. Performance evaluation on RGB-D sequences collected in this work and two public RGB-D datasets: TUM and ICL-NUIM show the benefit of using the proposed depth fusion framework and combining the three feature-types, particularly in scenes with low-textured surfaces, dynamic objects and missing depth measurements.

* Major update: more results, depth filter released as opensource, 34 pages 

  Click for Model/Code and Paper
SPLODE: Semi-Probabilistic Point and Line Odometry with Depth Estimation from RGB-D Camera Motion

Aug 09, 2017
Pedro F. Proença, Yang Gao

Active depth cameras suffer from several limitations, which cause incomplete and noisy depth maps, and may consequently affect the performance of RGB-D Odometry. To address this issue, this paper presents a visual odometry method based on point and line features that leverages both measurements from a depth sensor and depth estimates from camera motion. Depth estimates are generated continuously by a probabilistic depth estimation framework for both types of features to compensate for the lack of depth measurements and inaccurate feature depth associations. The framework models explicitly the uncertainty of triangulating depth from both point and line observations to validate and obtain precise estimates. Furthermore, depth measurements are exploited by propagating them through a depth map registration module and using a frame-to-frame motion estimation method that considers 3D-to-2D and 2D-to-3D reprojection errors, independently. Results on RGB-D sequences captured on large indoor and outdoor scenes, where depth sensor limitations are critical, show that the combination of depth measurements and estimates through our approach is able to overcome the absence and inaccuracy of depth measurements.

* IROS 2017 

  Click for Model/Code and Paper
Probabilistic Combination of Noisy Points and Planes for RGB-D Odometry

May 18, 2017
Pedro F. Proença, Yang Gao

This work proposes a visual odometry method that combines points and plane primitives, extracted from a noisy depth camera. Depth measurement uncertainty is modelled and propagated through the extraction of geometric primitives to the frame-to-frame motion estimation, where pose is optimized by weighting the residuals of 3D point and planes matches, according to their uncertainties. Results on an RGB-D dataset show that the combination of points and planes, through the proposed method, is able to perform well in poorly textured environments, where point-based odometry is bound to fail.

* Accepted to TAROS 2017 

  Click for Model/Code and Paper
Canonical dual solutions to nonconvex radial basis neural network optimization problem

Feb 18, 2013
Vittorio Latorre, David Yang Gao

Radial Basis Functions Neural Networks (RBFNNs) are tools widely used in regression problems. One of their principal drawbacks is that the formulation corresponding to the training with the supervision of both the centers and the weights is a highly non-convex optimization problem, which leads to some fundamentally difficulties for traditional optimization theory and methods. This paper presents a generalized canonical duality theory for solving this challenging problem. We demonstrate that by sequential canonical dual transformations, the nonconvex optimization problem of the RBFNN can be reformulated as a canonical dual problem (without duality gap). Both global optimal solution and local extrema can be classified. Several applications to one of the most used Radial Basis Functions, the Gaussian function, are illustrated. Our results show that even for one-dimensional case, the global minimizer of the nonconvex problem may not be the best solution to the RBFNNs, and the canonical dual theory is a promising tool for solving general neural networks training problems.

* 10 pages, 7 figures, one table 

  Click for Model/Code and Paper
Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation

Nov 29, 2019
Edwin Simpson, Yang Gao, Iryna Gurevych

For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user's needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which attempt to learn a ranking across the whole candidate space, our method employs Bayesian optimisation to focus the user's labelling effort on high quality candidates and integrates prior knowledge in a Bayesian manner to cope better with small data scenarios. We apply our method to community question answering (cQA) and extractive summarisation, finding that it significantly outperforms existing interactive approaches. We also show that the ranking function learned by our method is an effective reward function for reinforcement learning, which improves the state of the art for interactive summarisation.


  Click for Model/Code and Paper
SelectNet: Learning to Sample from the Wild for Imbalanced Data Training

May 23, 2019
Yunru Liu, Tingran Gao, Haizhao Yang

Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data "in the wild" that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision.


  Click for Model/Code and Paper
A Forest from the Trees: Generation through Neighborhoods

Feb 04, 2019
Yang Li, Tianxiang Gao, Junier Oliva

In this work, we propose to learn a generative model using both learned features (through a latent space) and memories (through neighbors). Although human learning makes seamless use of both learned perceptual features and instance recall, current generative learning paradigms only make use of one of these two components. Take, for instance, flow models, which learn a latent space of invertible features that follow a simple distribution. Conversely, kernel density techniques use instances to shift a simple distribution into an aggregate mixture model. Here we propose multiple methods to enhance the latent space of a flow model with neighborhood information. Not only does our proposed framework represent a more human-like approach by leveraging both learned features and memories, but it may also be viewed as a step forward in non-parametric methods. The efficacy of our model is shown empirically with standard image datasets. We observe compelling results and a significant improvement over baselines.


  Click for Model/Code and Paper
N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps

May 03, 2018
Yang Liu, Qiang Qu, Chao Gao

Considering the use of Fully Connected (FC) layer limits the performance of Convolutional Neural Networks (CNNs), this paper develops a method to improve the coupling between the convolution layer and the FC layer by reducing the noise in Feature Maps (FMs). Our approach is divided into three steps. Firstly, we separate all the FMs into n blocks equally. Then, the weighted summation of FMs at the same position in all blocks constitutes a new block of FMs. Finally, we replicate this new block into n copies and concatenate them as the input to the FC layer. This sharing of FMs could reduce the noise in them apparently and avert the impact by a particular FM on the specific part weight of hidden layers, hence preventing the network from overfitting to some extent. Using the Fermat Lemma, we prove that this method could make the global minima value range of the loss function wider, by which makes it easier for neural networks to converge and accelerates the convergence process. This method does not significantly increase the amounts of network parameters (only a few more coefficients added), and the experiments demonstrate that this method could increase the convergence speed and improve the classification performance of neural networks.

* 7 pages, 5 figures, submitted to ICALIP 2018 

  Click for Model/Code and Paper
Cross-Domain Adversarial Auto-Encoder

Apr 17, 2018
Haodi Hou, Jing Huo, Yang Gao

In this paper, we propose the Cross-Domain Adversarial Auto-Encoder (CDAAE) to address the problem of cross-domain image inference, generation and transformation. We make the assumption that images from different domains share the same latent code space for content, while having separate latent code space for style. The proposed framework can map cross-domain data to a latent code vector consisting of a content part and a style part. The latent code vector is matched with a prior distribution so that we can generate meaningful samples from any part of the prior space. Consequently, given a sample of one domain, our framework can generate various samples of the other domain with the same content of the input. This makes the proposed framework different from the current work of cross-domain transformation. Besides, the proposed framework can be trained with both labeled and unlabeled data, which makes it also suitable for domain adaptation. Experimental results on data sets SVHN, MNIST and CASIA show the proposed framework achieved visually appealing performance for image generation task. Besides, we also demonstrate the proposed method achieved superior results for domain adaptation. Code of our experiments is available in https://github.com/luckycallor/CDAAE.

* Under review as a conference paper of KDD 2018 

  Click for Model/Code and Paper
Speaker identification from the sound of the human breath

Dec 04, 2017
Wenbo Zhao, Yang Gao, Rita Singh

This paper examines the speaker identification potential of breath sounds in continuous speech. Speech is largely produced during exhalation. In order to replenish air in the lungs, speakers must periodically inhale. When inhalation occurs in the midst of continuous speech, it is generally through the mouth. Intra-speech breathing behavior has been the subject of much study, including the patterns, cadence, and variations in energy levels. However, an often ignored characteristic is the {\em sound} produced during the inhalation phase of this cycle. Intra-speech inhalation is rapid and energetic, performed with open mouth and glottis, effectively exposing the entire vocal tract to enable maximum intake of air. This results in vocal tract resonances evoked by turbulence that are characteristic of the speaker's speech-producing apparatus. Consequently, the sounds of inhalation are expected to carry information about the speaker's identity. Moreover, unlike other spoken sounds which are subject to active control, inhalation sounds are generally more natural and less affected by voluntary influences. The goal of this paper is to demonstrate that breath sounds are indeed bio-signatures that can be used to identify speakers. We show that these sounds by themselves can yield remarkably accurate speaker recognition with appropriate feature representations and classification frameworks.

* 5 pages, 3 figures 

  Click for Model/Code and Paper
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

Jul 31, 2017
Zhenheng Yang, Jiyang Gao, Ram Nevatia

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing large-scale video data. To tackle this problem, we propose a cascade proposal and location anticipation (CPLA) model for frame-level action detection. There are several salient points of our model: (1) a cascade region proposal network (casRPN) is adopted for action proposal generation and shows better localization accuracy compared with single region proposal network (RPN); (2) action spatio-temporal consistencies are exploited via a location anticipation network (LAN) and thus frame-level action detection is not conducted independently. Frame-level detections are then linked by solving an linking score maximization problem, and temporally trimmed into spatio-temporal action tubes. We demonstrate the effectiveness of our model on the challenging UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.

* Accepted at BMVC 2017 (oral) 

  Click for Model/Code and Paper
RED: Reinforced Encoder-Decoder Networks for Action Anticipation

Jul 16, 2017
Jiyang Gao, Zhenheng Yang, Ram Nevatia

Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame's representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.


  Click for Model/Code and Paper
Robust Online Multi-Task Learning with Correlative and Personalized Structures

Jun 06, 2017
Peng Yang, Peilin Zhao, Xin Gao

Multi-Task Learning (MTL) can enhance a classifier's generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch setting, and suffers from expensive training cost and poor scalability. To address such inefficiency issues, online learning techniques have been applied to solve MTL problems. However, most existing algorithms of online MTL constrain task relatedness into a presumed structure via a single weight matrix, which is a strict restriction that does not always hold in practice. In this paper, we propose a robust online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: the first one captures the low-rank common structure among tasks via a nuclear norm and the second one identifies the personalized patterns of outlier tasks via a group lasso. Theoretical analysis shows the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Even though the above framework achieves good performance, the nuclear norm that simply adds all nonzero singular values together may not be a good low-rank approximation. To improve the results, we use a log-determinant function as a non-convex rank approximation. The gradient scheme is applied to optimize log-determinant function and can obtain a closed-form solution for this refined problem. Experimental results on a number of real-world applications verify the efficacy of our method.


  Click for Model/Code and Paper
Cascaded Boundary Regression for Temporal Action Detection

May 02, 2017
Jiyang Gao, Zhenheng Yang, Ram Nevatia

Temporal action detection in long videos is an important problem. State-of-the-art methods address this problem by applying action classifiers on sliding windows. Although sliding windows may contain an identifiable portion of the actions, they may not necessarily cover the entire action instance, which would lead to inferior performance. We adapt a two-stage temporal action detection pipeline with Cascaded Boundary Regression (CBR) model. Class-agnostic proposals and specific actions are detected respectively in the first and the second stage. CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows. The salient aspect of the refinement process is that, inside each stage, the temporal boundaries are adjusted in a cascaded way by feeding the refined windows back to the system for further boundary refinement. We test CBR on THUMOS-14 and TVSeries, and achieve state-of-the-art performance on both datasets. The performance gain is especially remarkable under high IoU thresholds, e.g. map@tIoU=0.5 on THUMOS-14 is improved from 19.0% to 31.0%.


  Click for Model/Code and Paper