Models, code, and papers for "Cheng Wang":

RRA: Recurrent Residual Attention for Sequence Learning

Sep 12, 2017
Cheng Wang

In this paper, we propose a recurrent neural network (RNN) with residual attention (RRA) to learn long-range dependencies from sequential data. We propose to add residual connections across timesteps to RNN, which explicitly enhances the interaction between current state and hidden states that are several timesteps apart. This also allows training errors to be directly back-propagated through residual connections and effectively alleviates gradient vanishing problem. We further reformulate an attention mechanism over residual connections. An attention gate is defined to summarize the individual contribution from multiple previous hidden states in computing the current state. We evaluate RRA on three tasks: the adding problem, pixel-by-pixel MNIST classification and sentiment analysis on the IMDB dataset. Our experiments demonstrate that RRA yields better performance, faster convergence and more stable training compared to a standard LSTM network. Furthermore, RRA shows highly competitive performance to the state-of-the-art methods.

* 9 pages, 7 figures 

  Click for Model/Code and Paper
Feature Learning based Deep Supervised Hashing with Pairwise Labels

Apr 21, 2016
Wu-Jun Li, Sheng Wang, Wang-Cheng Kang

Recent years have witnessed wide application of hashing for large-scale image retrieval. However, most existing hashing methods are based on hand-crafted features which might not be optimally compatible with the hashing procedure. Recently, deep hashing methods have been proposed to perform simultaneous feature learning and hash-code learning with deep neural networks, which have shown better performance than traditional hashing methods with hand-crafted features. Most of these deep hashing methods are supervised whose supervised information is given with triplet labels. For another common application scenario with pairwise labels, there have not existed methods for simultaneous feature learning and hash-code learning. In this paper, we propose a novel deep hashing method, called deep pairwise-supervised hashing(DPSH), to perform simultaneous feature learning and hash-code learning for applications with pairwise labels. Experiments on real datasets show that our DPSH method can outperform other methods to achieve the state-of-the-art performance in image retrieval applications.

* IJCAI 2016 

  Click for Model/Code and Paper
Visually-Aware Fashion Recommendation and Design with Generative Image Models

Nov 07, 2017
Wang-Cheng Kang, Chen Fang, Zhaowen Wang, Julian McAuley

Building effective recommender systems for domains like fashion is challenging due to the high level of subjectivity and the semantic complexity of the features involved (i.e., fashion styles). Recent work has shown that approaches to `visual' recommendation (e.g.~clothing, art, etc.) can be made more accurate by incorporating visual signals directly into the recommendation objective, using `off-the-shelf' feature representations derived from deep networks. Here, we seek to extend this contribution by showing that recommendation performance can be significantly improved by learning `fashion aware' image representations directly, i.e., by training the image representation (from the pixel level) and the recommender system jointly; this contribution is related to recent work using Siamese CNNs, though we are able to show improvements over state-of-the-art recommendation techniques such as BPR and variants that make use of pre-trained visual features. Furthermore, we show that our model can be used \emph{generatively}, i.e., given a user and a product category, we can generate new images (i.e., clothing items) that are most consistent with their personal taste. This represents a first step towards building systems that go beyond recommending existing items from a product corpus, but which can be used to suggest styles and aid the design of new products.

* 10 pages, 6 figures. Accepted by ICDM'17 as a long paper 

  Click for Model/Code and Paper
CosRec: 2D Convolutional Neural Networks for Sequential Recommendation

Aug 27, 2019
An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley

Sequential patterns play an important role in building modern recommender systems. To this end, several recommender systems have been built on top of Markov Chains and Recurrent Models (among others). Although these sequential models have proven successful at a range of tasks, they still struggle to uncover complex relationships nested in user purchase histories. In this paper, we argue that modeling pairwise relationships directly leads to an efficient representation of sequential features and captures complex item correlations. Specifically, we propose a 2D convolutional network for sequential recommendation (CosRec). It encodes a sequence of items into a three-way tensor; learns local features using 2D convolutional filters; and aggregates high-order interactions in a feedforward manner. Quantitative results on two public datasets show that our method outperforms both conventional methods and recent sequence-based approaches, achieving state-of-the-art performance on various evaluation metrics.

* To appear in CIKM-2019, code at 

  Click for Model/Code and Paper
Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

Oct 03, 2019
Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid

Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric image reconstruction. More significantly, due to lack of proper constraints, networks output scale-inconsistent results over different samples, i.e., the ego-motion network cannot provide full camera trajectories over a long video sequence because of the per-frame scale ambiguity. This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions. Since we do not leverage multi-task learning like recent works, our framework is much simpler and more efficient. Comprehensive evaluation results demonstrate that our depth estimator achieves the state-of-the-art performance on the KITTI dataset. Moreover, we show that our ego-motion network is able to predict a globally scale-consistent camera trajectory for long video sequences, and the resulting visual odometry accuracy is competitive with the recent model that is trained using stereo videos. To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.

* Accepted to NeurIPS 2019. Code is available at 

  Click for Model/Code and Paper
Duckiefloat: a Collision-Tolerant Resource-Constrained Blimp for Long-Term Autonomy in Subterranean Environments

Oct 31, 2019
Yi-Wei Huang, Chen-Lung Lu, Kuan-Lin Chen, Po-Sheng Ser, Jui-Te Huang, Yu-Chia Shen, Pin-Wei Chen, Po-Kai Chang, Sheng-Cheng Lee, Hsueh-Cheng Wang

There are several challenges for search and rescue robots: mobility, perception, autonomy, and communication. Inspired by the DARPA Subterranean (SubT) Challenge, we propose an autonomous blimp robot, which has the advantages of low power consumption and collision-tolerance compared to other aerial vehicles like drones. This is important for search and rescue tasks that usually last for one or more hours. However, the underground constrained passages limit the size of blimp envelope and its payload, making the proposed system resource-constrained. Therefore, a careful design consideration is needed to build a blimp system with on-board artifact search and SLAM. In order to reach long-term operation, a failure-aware algorithm with minimal communication to human supervisor to have situational awareness and send control signals to the blimp when needed.

  Click for Model/Code and Paper
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Feb 21, 2019
Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.

  Click for Model/Code and Paper
State-Regularized Recurrent Neural Networks

Jan 25, 2019
Cheng Wang, Mathias Niepert

Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization (a) simplifies the extraction of finite state automata modeling an RNN's state transition dynamics; (b) forces RNNs to operate more like automata with external memory and less like finite state machines; (c) makes RNNs have better interpretability and explainability.

* 20 pages 

  Click for Model/Code and Paper
How we can control the crack to propagate along the specified path feasibly?

Nov 14, 2017
Zhenxing Cheng, Hu Wang

A controllable crack propagation (CCP) strategy is suggested. It is well known that crack always leads the failure by crossing the critical domain in engineering structure. Therefore, the CCP method is proposed to control the crack to propagate along the specified path, which is away from the critical domain. To complete this strategy, two optimization methods are engaged. Firstly, a back propagation neural network (BPNN) assisted particle swarm optimization (PSO) is suggested. In this method, to improve the efficiency of CCP, the BPNN is used to build the metamodel instead of the forward evaluation. Secondly, the popular PSO is used. Considering the optimization iteration is a time consuming process, an efficient reanalysis based extended finite element methods (X-FEM) is used to substitute the complete X-FEM solver to calculate the crack propagation path. Moreover, an adaptive subdomain partition strategy is suggested to improve the fitting accuracy between real crack and specified paths. Several typical numerical examples demonstrate that both optimization methods can carry out the CCP. The selection of them should be determined by the tradeoff between efficiency and accuracy.

* 17 pages, 29 figures, Crack propagation path, Reanalysis solver, Back propagation neural network, Particle swarm optimization, Extended finite element method 

  Click for Model/Code and Paper
Fixed-point Factorized Networks

Aug 29, 2017
Peisong Wang, Jian Cheng

In recent years, Deep Neural Networks (DNN) based methods have achieved remarkable performance in a wide range of tasks and have been among the most powerful and widely used techniques in computer vision. However, DNN-based methods are both computational-intensive and resource-consuming, which hinders the application of these methods on embedded systems like smart phones. To alleviate this problem, we introduce a novel Fixed-point Factorized Networks (FFN) for pretrained models to reduce the computational complexity as well as the storage requirement of networks. The resulting networks have only weights of -1, 0 and 1, which significantly eliminates the most resource-consuming multiply-accumulate operations (MACs). Extensive experiments on large-scale ImageNet classification task show the proposed FFN only requires one-thousandth of multiply operations with comparable accuracy.

  Click for Model/Code and Paper
Free-body Gesture Tracking and Augmented Reality Improvisation for Floor and Aerial Dance

Sep 15, 2015
Tammuz Dubnov, Cheng-i Wang

This paper describes an updated interactive performance system for floor and Aerial Dance that controls visual and sonic aspects of the presentation via a depth sensing camera (MS Kinect). In order to detect, measure and track free movement in space, 3 degree of freedom (3-DOF) tracking in space (on the ground and in the air) is performed using IR markers with a method for multi target tracking capabilities added and described in detail. An improved gesture tracking and recognition system, called Action Graph (AG), is described in the paper. Action Graph uses an efficient incremental construction from a single long sequence of movement features and automatically captures repeated sub-segments in the movement from start to finish with no manual interaction needed with other advanced capabilities discussed as well. By using the new model for the gesture we can unify an entire choreography piece by dynamically tracking and recognizing gestures and sub-portions of the piece. This gives the performer the freedom to improvise based on a set of recorded gestures/portions of the choreography and have the system dynamically respond in relation to the performer within a set of related rehearsed actions, an ability that has not been seen in any other system to date.

* 8 pages. Technical paper 

  Click for Model/Code and Paper
Implicit Semantic Data Augmentation for Deep Networks

Sep 27, 2019
Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang

In this paper, we propose a novel implicit semantic data augmentation (ISDA) approach to complement traditional augmentation techniques like flipping, translation or rotation. Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e.g., adding sunglasses or changing backgrounds. As a consequence, translating training samples along many semantic directions in the feature space can effectively augment the dataset to improve generalization. To implement this idea effectively and efficiently, we first perform an online estimate of the covariance matrix of deep features for each class, which captures the intra-class semantic variations. Then random vectors are drawn from a zero-mean normal distribution with the estimated covariance to augment the training data in that class. Importantly, instead of augmenting the samples explicitly, we can directly minimize an upper bound of the expected cross-entropy (CE) loss on the augmented training set, leading to a highly efficient algorithm. In fact, we show that the proposed ISDA amounts to minimizing a novel robust CE loss, which adds negligible extra computational cost to a normal training procedure. Although being simple, ISDA consistently improves the generalization performance of popular deep models (ResNets and DenseNets) on a variety of datasets, e.g., CIFAR-10, CIFAR-100 and ImageNet. Code for reproducing our results are available at

* Accepted by NeurIPS 2019 

  Click for Model/Code and Paper
Self-Attentive Sequential Recommendation

Aug 20, 2018
Wang-Cheng Kang, Julian McAuley

Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the `context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are `relevant' from a user's action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.

* Accepted by ICDM'18 as a long paper 

  Click for Model/Code and Paper
Self-attention-based BiGRU and capsule network for named entity recognition

Jan 30, 2020
Jianfeng Deng, Lianglun Cheng, Zhuowei Wang

Named entity recognition(NER) is one of the tasks of natural language processing(NLP). In view of the problem that the traditional character representation ability is weak and the neural network method is unable to capture the important sequence information. An self-attention-based bidirectional gated recurrent unit(BiGRU) and capsule network(CapsNet) for NER is proposed. This model generates character vectors through bidirectional encoder representation of transformers(BERT) pre-trained model. BiGRU is used to capture sequence context features, and self-attention mechanism is proposed to give different focus on the information captured by hidden layer of BiGRU. Finally, we propose to use CapsNet for entity recognition. We evaluated the recognition performance of the model on two datasets. Experimental results show that the model has better performance without relying on external dictionary information.

  Click for Model/Code and Paper
Deep Reinforcement Learning for Foreign Exchange Trading

Aug 21, 2019
Chun-Chieh Wang, Yun-Cheng Tsai

Reinforcement learning can interact with the environment and is suitable for applications in decision control systems. Therefore, we used the reinforcement learning method to establish a foreign exchange transaction, avoiding the long-standing problem of unstable trends in deep learning predictions. In the system design, we optimized the Sure-Fire statistical arbitrage policy, set three different actions, encoded the continuous price over a period of time into a heat-map view of the Gramian Angular Field (GAF) and compared the Deep Q Learning (DQN) and Proximal Policy Optimization (PPO) algorithms. To test feasibility, we analyzed three currency pairs, namely EUR/USD, GBP/USD, and AUD/USD. We trained the data in units of four hours from 1 August 2018 to 30 November 2018 and tested model performance using data between 1 December 2018 and 31 December 2018. The test results of the various models indicated that favorable investment performance was achieved as long as the model was able to handle complex and random processes and the state was able to describe the environment, validating the feasibility of reinforcement learning in the development of trading strategies.

  Click for Model/Code and Paper
Accelerating Mini-batch SARAH by Step Size Rules

Jun 20, 2019
Zhuang Yang, Zengping Chen, Cheng Wang

StochAstic Recursive grAdient algoritHm (SARAH), originally proposed for convex optimization and also proven to be effective for general nonconvex optimization, has received great attention due to its simple recursive framework for updating stochastic gradient estimates. The performance of SARAH significantly depends on the choice of step size sequence. However, SARAH and its variants often employ a best-tuned step size by mentor, which is time consuming in practice. Motivated by this gap, we proposed a variant of the Barzilai-Borwein (BB) method, referred to as the Random Barzilai-Borwein (RBB) method, to calculate step size for SARAH in the mini-batch setting, thereby leading to a new SARAH method: MB-SARAH-RBB. We prove that MB-SARAH-RBB converges linearly in expectation for strongly convex objective functions. We analyze the complexity of MB-SARAH-RBB and show that it is better than the original method. Numerical experiments on standard data sets indicate that MB-SARAH-RBB outperforms or matches state-of-the-art algorithms.

  Click for Model/Code and Paper
Fully Automatic Brain Tumor Segmentation using a Normalized Gaussian Bayesian Classifier and 3D Fluid Vector Flow

May 01, 2019
Tao Wang, Irene Cheng, Anup Basu

Brain tumor segmentation from Magnetic Resonance Images (MRIs) is an important task to measure tumor responses to treatments. However, automatic segmentation is very challenging. This paper presents an automatic brain tumor segmentation method based on a Normalized Gaussian Bayesian classification and a new 3D Fluid Vector Flow (FVF) algorithm. In our method, a Normalized Gaussian Mixture Model (NGMM) is proposed and used to model the healthy brain tissues. Gaussian Bayesian Classifier is exploited to acquire a Gaussian Bayesian Brain Map (GBBM) from the test brain MR images. GBBM is further processed to initialize the 3D FVF algorithm, which segments the brain tumor. This algorithm has two major contributions. First, we present a NGMM to model healthy brains. Second, we extend our 2D FVF algorithm to 3D space and use it for brain tumor segmentation. The proposed method is validated on a publicly available dataset.

* ICIP 2010 

  Click for Model/Code and Paper
Multi-Scale Body-Part Mask Guided Attention for Person Re-identification

Apr 24, 2019
Honglong Cai, Zhiguan Wang, Jinxing Cheng

Person re-identification becomes a more and more important task due to its wide applications. In practice, person re-identification still remains challenging due to the variation of person pose, different lighting, occlusion, misalignment, background clutter, etc. In this paper, we propose a multi-scale body-part mask guided attention network (MMGA), which jointly learns whole-body and part body attention to help extract global and local features simultaneously. In MMGA, body-part masks are used to guide the training of corresponding attention. Experiments show that our proposed method can reduce the negative influence of variation of person pose, misalignment and background clutter. Our method achieves rank-1/mAP of 95.0%/87.2% on the Market1501 dataset, 89.5%/78.1% on the DukeMTMC-reID dataset, outperforming current state-of-the-art methods.

  Click for Model/Code and Paper
Super-resolution based generative adversarial network using visual perceptual loss function

Apr 24, 2019
Xuan Zhu, Yue Cheng, Rongzhi Wang

In recent years, perceptual-quality driven super-resolution methods show satisfactory results. However, super-resolved images have uncertain texture details and unpleasant artifact. We build a novel perceptual loss function composed of morphological components adversarial loss and color adversarial loss and salient content loss to ameliorate these problems. The adversarial loss is applied to constrain color and morphological components distribution of super-resolved images and the salient content loss highlights the perceptual similarity of feature-rich regions. Experiments show that proposed method achieves significant improvements in terms of perceptual index and visual quality compared with the state-of-the-art methods.

* 11 pages, 7 figures, 1 table 

  Click for Model/Code and Paper