Models, code, and papers for "Hang Yang":

Depth Image Upsampling based on Guided Filter with Low Gradient Minimization

Nov 12, 2018
Hang Yang, Zhongbo Zhang

In this paper, we present a novel upsampling framework to enhance the spatial resolution of the depth image. In our framework, the upscaling of a low-resolution depth image is guided by a corresponding intensity images, we formulate it as a cost aggregation problem with the guided filter. However, the guided filter does not make full use of the properties of the depth image. Since depth images have quite sparse gradients, it inspires us to regularize the gradients for improving depth upscaling results. Statistics show a special property of depth images, that is, there is a non-ignorable part of pixels whose horizontal or vertical derivatives are equal to $\pm 1$. Considering this special property, we propose a low gradient regularization method which reduces the penalty for horizontal or vertical derivative $\pm1$. The proposed low gradient regularization is integrated with the guided filter into the depth image upsampling method. Experimental results demonstrate the effectiveness of our proposed approach both qualitatively and quantitatively compared with the state-of-the-art methods.

* 28 pages, 7figures 

  Click for Model/Code and Paper
Density-based Clustering with Best-scored Random Forest

Jun 24, 2019
Hanyuan Hang, Yuchao Cai, Hanfang Yang

Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the optimal level and determine corresponding clusters. The terminology "best-scored" means to select one random tree with the best empirical performance out of a certain number of purely random tree candidates. From the theoretical perspective, we first show that consistency of our proposed algorithm can be guaranteed. Moreover, under certain mild restrictions on the underlying density functions and target clusters, even fast convergence rates can be achieved. Last but not least, comparisons with other state-of-the-art clustering methods in the numerical experiments demonstrate accuracy of our algorithm on both synthetic data and several benchmark real data sets.

  Click for Model/Code and Paper
A Robust Local Binary Similarity Pattern for Foreground Object Detection

Oct 18, 2018
Dongdong Zeng, Ming Zhu, Hang Yang

Accurate and fast extraction of the foreground object is one of the most significant issues to be solved due to its important meaning for object tracking and recognition in video surveillance. Although many foreground object detection methods have been proposed in the recent past, it is still regarded as a tough problem due to illumination variations and dynamic backgrounds challenges. In this paper, we propose a robust foreground object detection method with two aspects of contributions. First, we propose a robust texture operator named Robust Local Binary Similarity Pattern (RLBSP), which shows strong robustness to illumination variations and dynamic backgrounds. Second, a combination of color and texture features are used to characterize pixel representations, which compensate each other to make full use of their own advantages. Comprehensive experiments evaluated on the CDnet 2012 dataset demonstrate that the proposed method performs favorably against state-of-the-art methods.

* 2 pages 

  Click for Model/Code and Paper
An Adaptive Parameter Estimation for Guided Filter based Image Deconvolution

Sep 06, 2016
Hang Yang, Zhongbo Zhang, Yujing Guan

Image deconvolution is still to be a challenging ill-posed problem for recovering a clear image from a given blurry image, when the point spread function is known. Although competitive deconvolution methods are numerically impressive and approach theoretical limits, they are becoming more complex, making analysis, and implementation difficult. Furthermore, accurate estimation of the regularization parameter is not easy for successfully solving image deconvolution problems. In this paper, we develop an effective approach for image restoration based on one explicit image filter - guided filter. By applying the decouple of denoising and deblurring techniques to the deconvolution model, we reduce the optimization complexity and achieve a simple but effective algorithm to automatically compute the parameter in each iteration, which is based on Morozov's discrepancy principle. Experimental results demonstrate that the proposed algorithm outperforms many state-of-the-art deconvolution methods in terms of both ISNR and visual quality.

* 10 pages, 8 figures 

  Click for Model/Code and Paper
Guided Filter based Edge-preserving Image Non-blind Deconvolution

Sep 07, 2016
Hang Yang, Ming Zhu, Zhongbo Zhang, Heyan Huang

In this work, we propose a new approach for efficient edge-preserving image deconvolution. Our algorithm is based on a novel type of explicit image filter - guided filter. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter, but has better behaviors near edges. We propose an efficient iterative algorithm with the decouple of deblurring and denoising steps in the restoration process. In deblurring step, we proposed two cost function which could be computed with fast Fourier transform efficiently. The solution of the first one is used as the guidance image, and another solution will be filtered in next step. In the denoising step, the guided filter is used with the two obtained images for efficient edge-preserving filtering. Furthermore, we derive a simple and effective method to automatically adjust the regularization parameter at each iteration. We compare our deconvolution algorithm with many competitive deconvolution techniques in terms of ISNR and visual quality.

* The 20th IEEE International Conference on Image Processing (ICIP),2013,4593--4596 
* 4 pages, 3 figures, ICIP 2013. arXiv admin note: text overlap with arXiv:1609.01380 

  Click for Model/Code and Paper
Towards Noise-Robust Neural Networks via Progressive Adversarial Training

Sep 17, 2019
Hang Yu, Aishan Liu, Xianglong Liu, Jichen Yang, Chongzhi Zhang

Adversarial examples, intentionally designed inputs tending to mislead deep neural networks, have attracted great attention in the past few years. Although a series of defense strategies have been developed and achieved encouraging model robustness, most of them are still vulnerable to the more commonly witnessed corruptions, e.g., Gaussian noise, blur, etc., in the real world. In this paper, we theoretically and empirically discover the fact that there exists an inherent connection between adversarial robustness and corruption robustness. Based on the fundamental discovery, this paper further proposes a more powerful training method named Progressive Adversarial Training (PAT) that adds diversified adversarial noises progressively during training, and thus obtains robust model against both adversarial examples and corruptions through higher training data complexity. Meanwhile, we also theoretically find that PAT can promise better generalization ability. Experimental evaluation on MNIST, CIFAR-10 and SVHN show that PAT is able to enhance the robustness and generalization of the state-of-the-art network structures, performing comprehensively well compared to various augmentation methods. Moreover, we also propose Mixed Test to evaluate model generalization ability more fairly.

  Click for Model/Code and Paper
Context Gates for Neural Machine Translation

Mar 08, 2017
Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, Hang Li

In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.

* Accepted by TACL 2017 

  Click for Model/Code and Paper
Neural Machine Translation with Reconstruction

Nov 21, 2016
Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li

Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems.

* Accepted by AAAI 2017 

  Click for Model/Code and Paper
Modeling Coverage for Neural Machine Translation

Aug 06, 2016
Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li

Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.

* Add subjective evaluation on top of ACL version: 25% of source words are under-translated by NMT 

  Click for Model/Code and Paper
Simultaneous Block-Sparse Signal Recovery Using Pattern-Coupled Sparse Bayesian Learning

Nov 06, 2017
Hang Xiao, Zhengli Xing, Linxiao Yang, Jun Fang, Yanlun Wu

In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between neighboring coefficients of the common row sparsity MMV signals. Unlike many other methods, the proposed method is able to automatically capture the block sparse structure of the unknown signal. Our method is developed using an expectation-maximization (EM) framework. Simulation results show that our proposed method offers competitive performance in recovering block-sparse common row sparsity pattern MMV signals.

  Click for Model/Code and Paper
Empirical Study on Deep Learning Models for Question Answering

Nov 20, 2015
Yang Yu, Wei Zhang, Chung-Wei Hang, Bing Xiang, Bowen Zhou

In this paper we explore deep learning models with memory component or attention mechanism for question answering task. We combine and compare three models, Neural Machine Translation, Neural Turing Machine, and Memory Networks for a simulated QA data set. This paper is the first one that uses Neural Machine Translation and Neural Turing Machines for solving QA tasks. Our results suggest that the combination of attention and memory have potential to solve certain QA problem.

  Click for Model/Code and Paper
Weighted Bilinear Coding over Salient Body Parts for Person Re-identification

Apr 30, 2018
Qin Zhou, Heng Fan, Hang Su, Hua Yang, Shibao Zheng, Haibin Ling

Deep convolutional neural networks (CNNs) have demonstrated dominant performance in person re-identification (Re-ID). Existing CNN based methods utilize global average pooling (GAP) to aggregate intermediate convolutional features for Re-ID. However, this strategy only considers the first-order statistics of local features and treats local features at different locations equally important, leading to sub-optimal feature representation. To deal with these issues, we propose a novel \emph{weighted bilinear coding} (WBC) model for local feature aggregation in CNN networks to pursue more representative and discriminative feature representations. In specific, bilinear coding is used to encode the channel-wise feature correlations to capture richer feature interactions. Meanwhile, a weighting scheme is applied on the bilinear coding to adaptively adjust the weights of local features at different locations based on their importance in recognition, further improving the discriminability of feature aggregation. To handle the spatial misalignment issue, we use a salient part net to derive salient body parts, and apply the WBC model on each part. The final representation, formed by concatenating the WBC eoncoded features of each part, is both discriminative and resistant to spatial misalignment. Experiments on three benchmarks including Market-1501, DukeMTMC-reID and CUHK03 evidence the favorable performance of our method against other state-of-the-art methods.

* This manuscript is under consideration at Pattern Recognition Letters 

  Click for Model/Code and Paper
Feasibility Study: Moving Non-Homogeneous Teams in Congested Video Game Environments

Oct 04, 2017
Hang Ma, Jingxing Yang, Liron Cohen, T. K. Satish Kumar, Sven Koenig

Multi-agent path finding (MAPF) is a well-studied problem in artificial intelligence, where one needs to find collision-free paths for agents with given start and goal locations. In video games, agents of different types often form teams. In this paper, we demonstrate the usefulness of MAPF algorithms from artificial intelligence for moving such non-homogeneous teams in congested video game environments.

* To appear in AIIDE 17 

  Click for Model/Code and Paper
Gated Fusion Network for Joint Image Deblurring and Super-Resolution

Jul 27, 2018
Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang

Single-image super-resolution is a fundamental task for vision applications to enhance the image quality with respect to spatial resolution. If the input image contains degraded pixels, the artifacts caused by the degradation could be amplified by super-resolution methods. Image blur is a common degradation source. Images captured by moving or still cameras are inevitably affected by motion blur due to relative movements between sensors and objects. In this work, we focus on the super-resolution task with the presence of motion blur. We propose a deep gated fusion convolution neural network to generate a clear high-resolution frame from a single natural image with severe blur. By decomposing the feature extraction step into two task-independent streams, the dual-branch design can facilitate the training process by avoiding learning the mixed degradation all-in-one and thus enhance the final high-resolution prediction results. Extensive experiments demonstrate that our method generates sharper super-resolved images from low-resolution inputs with high computational efficiency.

* Accepted as an oral presentation at BMVC 2018 

  Click for Model/Code and Paper
Robust and Efficient Graph Correspondence Transfer for Person Re-identification

May 15, 2018
Qin Zhou, Heng Fan, Hua Yang, Hang Su, Shibao Zheng, Shuang Wu, Haibin Ling

Spatial misalignment caused by variations in poses and viewpoints is one of the most critical issues that hinders the performance improvement in existing person re-identification (Re-ID) algorithms. To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID. Specifically, we propose to establish the patch-wise correspondences of positive training pairs via graph matching. By exploiting both spatial and visual contexts of human appearance in graph matching, meaningful semantic correspondences can be obtained. To circumvent the cumbersome \emph{on-line} graph matching in testing phase, we propose to transfer the \emph{off-line} learned patch-wise correspondences from the positive training pairs to test pairs. In detail, for each test pair, the training pairs with similar pose-pair configurations are selected as references. The matching patterns (i.e., the correspondences) of the selected references are then utilized to calculate the patch-wise feature distances of this test pair. To enhance the robustness of correspondence transfer, we design a novel pose context descriptor to accurately model human body configurations, and present an approach to measure the similarity between a pair of pose context descriptors. Meanwhile, to improve testing efficiency, we propose a correspondence template ensemble method using the voting mechanism, which significantly reduces the amount of patch-wise matchings involved in distance calculation. With aforementioned strategies, the REGCT model can effectively and efficiently handle the spatial misalignment problem in Re-ID. Extensive experiments on five challenging benchmarks, including VIPeR, Road, PRID450S, 3DPES and CUHK01, evidence the superior performance of REGCT over other state-of-the-art approaches.

* Tech. Report. The source code is available at arXiv admin note: text overlap with arXiv:1804.00242 

  Click for Model/Code and Paper
SPLATNet: Sparse Lattice Networks for Point Cloud Processing

May 09, 2018
Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz

We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naively applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.

* Camera-ready, accepted to CVPR 2018 (oral); project website: 

  Click for Model/Code and Paper
MANAS: Multi-Agent Neural Architecture Search

Sep 05, 2019
Fabio Maria Carlucci, Pedro M Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang

The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective. Due to the large architecture parameter space, efficiency is a key bottleneck preventing NAS from its practical use. In this paper, we address the issue by framing NAS as a multi-agent problem where agents control a subset of the network and coordinate to reach optimal architectures. We provide two distinct lightweight implementations, with reduced memory requirements (1/8th of state-of-the-art), and performances above those of much more computationally expensive methods. Theoretically, we demonstrate vanishing regrets of the form O(sqrt(T)), with T being the total number of rounds. Finally, aware that random search is an, often ignored, effective baseline we perform additional experiments on 3 alternative datasets and 2 network configurations, and achieve favourable results in comparison.

  Click for Model/Code and Paper
Spine-Inspired Continuum Soft Exoskeleton for Stoop Lifting Assistance

Jul 04, 2019
Xiaolong Yang, Tzu-Hao Huang, Hang Hu, Shuangyue Yu, Sainan Zhang, Xianlian Zhou, Alessandra Carriero, Guang Yue, Hao Su

Back injuries are the most prevalent work-related musculoskeletal disorders and represent a major cause of disability. Although innovations in wearable robots aim to alleviate this hazard, the majority of existing exoskeletons are obtrusive because the rigid linkage design limits natural movement, thus causing ergonomic risk. Moreover, these existing systems are typically only suitable for one type of movement assistance, not ubiquitous for a wide variety of activities. To fill in this gap, this paper presents a new wearable robot design approach continuum soft exoskeleton. This spine-inspired wearable robot is unobtrusive and assists both squat and stoops while not impeding walking motion. To tackle the challenge of the unique anatomy of spine that is inappropriate to be simplified as a single degree of freedom joint, our robot is conformal to human anatomy and it can reduce multiple types of forces along the human spine such as the spinae muscle force, shear, and compression force of the lumbar vertebrae. We derived kinematics and kinetics models of this mechanism and established an analytical biomechanics model of human-robot interaction. Quantitative analysis of disc compression force, disc shear force and muscle force was performed in simulation. We further developed a virtual impedance control strategy to deliver force control and compensate hysteresis of Bowden cable transmission. The feasibility of the prototype was experimentally tested on three healthy subjects. The root mean square error of force tracking is 6.63 N (3.3 % of the 200N peak force) and it demonstrated that it can actively control the stiffness to the desired value. This continuum soft exoskeleton represents a feasible solution with the potential to reduce back pain for multiple activities and multiple forces along the human spine.

* IROS 2019 
* 8 pages, 13 figures 

  Click for Model/Code and Paper
TorontoCity: Seeing the World with a Million Eyes

Dec 01, 2016
Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun

In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712.5 $km^2$ of land, 8439 $km$ of road and around 400,000 buildings. Our benchmark provides different perspectives of the world captured from airplanes, drones and cars driving around the city. Manually labeling such a large scale dataset is infeasible. Instead, we propose to utilize different sources of high-precision maps to create our ground truth. Towards this goal, we develop algorithms that allow us to align all data sources with the maps while requiring minimal human supervision. We have designed a wide variety of tasks including building height estimation (reconstruction), road centerline and curb extraction, building instance segmentation, building contour extraction (reorganization), semantic labeling and scene type classification (recognition). Our pilot study shows that most of these tasks are still difficult for modern convolutional neural networks.

  Click for Model/Code and Paper