Models, code, and papers for "Si Chen":

Adaptive Deep Metric Embeddings for Person Re-Identification under Occlusions

Feb 07, 2020
Wanxiang Yang, Yan Yan, Si Chen

Person re-identification (ReID) under occlusions is a challenging problem in video surveillance. Most of existing person ReID methods take advantage of local features to deal with occlusions. However, these methods usually independently extract features from the local regions of an image without considering the relationship among different local regions. In this paper, we propose a novel person ReID method, which learns the spatial dependencies between the local regions and extracts the discriminative feature representation of the pedestrian image based on Long Short-Term Memory (LSTM), dealing with the problem of occlusions. In particular, we propose a novel loss (termed the adaptive nearest neighbor loss) based on the classification uncertainty to effectively reduce intra-class variations while enlarging inter-class differences within the adaptive neighborhood of the sample. The proposed loss enables the deep neural network to adaptively learn discriminative metric embeddings, which significantly improve the generalization capability of recognizing unseen person identities. Extensive comparative evaluations on challenging person ReID datasets demonstrate the significantly improved performance of the proposed method compared with several state-of-the-art methods.

* 6 pages, 3 figures 

  Click for Model/Code and Paper
Deep Tracking: Visual Tracking Using Deep Convolutional Networks

Dec 13, 2015
Meera Hahn, Si Chen, Afshin Dehghan

In this paper, we study a discriminatively trained deep convolutional network for the task of visual tracking. Our tracker utilizes both motion and appearance features that are extracted from a pre-trained dual stream deep convolution network. We show that the features extracted from our dual-stream network can provide rich information about the target and this leads to competitive performance against state of the art tracking methods on a visual tracking benchmark.


  Click for Model/Code and Paper
Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation

Feb 07, 2020
Yihan Du, Yan Yan, Si Chen, Yang Hua

In recent years, deep learning based visual tracking methods have obtained great success owing to the powerful feature representation ability of Convolutional Neural Networks (CNNs). Among these methods, classification-based tracking methods exhibit excellent performance while their speeds are heavily limited by the expensive computation for massive proposal feature extraction. In contrast, matching-based tracking methods (such as Siamese networks) possess remarkable speed superiority. However, the absence of online updating renders these methods unadaptable to significant object appearance variations. In this paper, we propose a novel real-time visual tracking method, which adopts an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations. For high computational efficiency, we also present a fast proposal selection strategy, which utilizes the matching-based tracking method to pre-estimate dense proposals and selects high-quality ones to feed to the LSTM network for classification. This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods. In addition, to handle the problems of sample inadequacy and class imbalance during online tracking, we adopt a data augmentation technique based on the Generative Adversarial Network (GAN) to facilitate the training of the LSTM network. Extensive experiments on four visual tracking benchmarks demonstrate the state-of-the-art performance of our method in terms of both tracking accuracy and speed, which exhibits great potentials of recurrent structures for visual tracking.


  Click for Model/Code and Paper
GaterNet: Dynamic Filter Selection in Convolutional Neural Network via a Dedicated Global Gating Network

Nov 27, 2018
Zhourong Chen, Yang Li, Samy Bengio, Si Si

The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR-10, our model also improves upon state-of-the-art results.

* Google Research 

  Click for Model/Code and Paper
Multi-task Learning of Cascaded CNN for Facial Attribute Classification

May 03, 2018
Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang

Recently, facial attribute classification (FAC) has attracted significant attention in the computer vision community. Great progress has been made along with the availability of challenging FAC datasets. However, conventional FAC methods usually firstly pre-process the input images (i.e., perform face detection and alignment) and then predict facial attributes. These methods ignore the inherent dependencies among these tasks (i.e., face detection, facial landmark localization and FAC). Moreover, some methods using convolutional neural network are trained based on the fixed loss weights without considering the differences between facial attributes. In order to address the above problems, we propose a novel multi-task learning of cas- caded convolutional neural network method, termed MCFA, for predicting multiple facial attributes simultaneously. Specifically, the proposed method takes advantage of three cascaded sub-networks (i.e., S_Net, M_Net and L_Net corresponding to the neural networks under different scales) to jointly train multiple tasks in a coarse-to-fine manner, which can achieve end-to-end optimization. Furthermore, the proposed method automatically assigns the loss weight to each facial attribute based on a novel dynamic weighting scheme, thus making the proposed method concentrate on predicting the more difficult facial attributes. Experimental results show that the proposed method outperforms several state-of-the-art FAC methods on the challenging CelebA and LFWA datasets.


  Click for Model/Code and Paper
Joint Deep Learning of Facial Expression Synthesis and Recognition

Feb 06, 2020
Yan Yan, Ying Huang, Si Chen, Chunhua Shen, Hanzi Wang

Recently, deep learning based facial expression recognition (FER) methods have attracted considerable attention and they usually require large-scale labelled training data. Nonetheless, the publicly available facial expression databases typically contain a small amount of labelled data. In this paper, to overcome the above issue, we propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER. More specifically, the proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions. To increase the diversity of the training images, FESGAN is elaborately designed to generate images with new identities from a prior distribution. Secondly, an expression recognition network is jointly learned with the pre-trained FESGAN in a unified framework. In particular, the classification loss computed from the recognition network is used to simultaneously optimize the performance of both the recognition network and the generator of FESGAN. Moreover, in order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm to reduce the intra-class variations of images from the same class, which can significantly improve the final performance. Extensive experimental results on public facial expression databases demonstrate the superiority of the proposed method compared with several state-of-the-art FER methods.


  Click for Model/Code and Paper
AdversarialNAS: Adversarial Neural Architecture Search for GANs

Dec 04, 2019
Chen Gao, Yunpeng Chen, Si Liu, Zhenxiong Tan, Shuicheng Yan

Neural Architecture Search (NAS) that aims to automate the procedure of architecture design has achieved promising results in many computer vision fields. In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation. The proposed method leverages an adversarial searching mechanism to search for the architectures of generator and discriminator simultaneously in a differentiable manner. Therefore, the searching algorithm considers the relevance and balance between the two networks leading to search for a superior generative model. Besides, AdversarialNAS does not need any extra evaluation metric to evaluate the performance of the architecture in each searching iteration, which is very efficient and can take only 1 GPU day to search for an optimal network architecture in a large search space ($10^{38}$). Experiments demonstrate the effectiveness and superiority of our method. The discovered generative model sets a new state-of-the-art FID score of $10.87$ and highly competitive Inception Score of $8.74$ on CIFAR-10. Its transferability is also proven by setting new state-of-the-art FID score of $26.98$ and Inception score of $9.63$ on STL-10. Our code will be released to facilitate the related academic and industrial study.


  Click for Model/Code and Paper
Progressive Cluster Purification for Transductive Few-shot Learning

Jun 10, 2019
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

Few-shot learning aims to learn to generalize a classifier to novel classes with limited labeled data. Transductive inference that utilizes unlabeled test set to deal with low-data problem has been employed for few-shot learning in recent literature. Yet, these methods do not explicitly exploit the manifold structures of semantic clusters, which is inefficient for transductive inference. In this paper, we propose a novel Progressive Cluster Purification (PCP) method for transductive few-shot learning. The PCP can progressively purify the cluster by exploring the semantic interdependency in the individual cluster space. Specifically, the PCP consists of two-level operations: inter-class classification and intra-class transduction. The inter-class classification partitions all the test samples into several clusters by comparing the test samples with the prototypes. The intra-class transduction effectively explores trustworthy test samples for each cluster by modeling data relations within a cluster as well as among different clusters. Then, it refines the prototypes to better represent the real distribution of semantic clusters. The refined prototypes are used to remeasure all the test instances and purify each cluster. Furthermore, the inter-class classification and the intra-class transduction are extremely flexible to be repeated several times to progressively purify the clusters. Experimental results are provided on two datasets: miniImageNet dataset and tieredImageNet dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.


  Click for Model/Code and Paper
An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Mar 29, 2019
Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.

* Accepted by CVPR2019 

  Click for Model/Code and Paper
The Global Convergence Analysis of the Bat Algorithm Using a Markovian Framework and Dynamical System Theory

Mar 27, 2019
Si Chen, Guo-Hua Peng, Xing-Shi He, Xin-She Yang

The bat algorithm (BA) has been shown to be effective to solve a wider range of optimization problems. However, there is not much theoretical analysis concerning its convergence and stability. In order to prove the convergence of the bat algorithm, we have built a Markov model for the algorithm and proved that the state sequence of the bat population forms a finite homogeneous Markov chain, satisfying the global convergence criteria. Then, we prove that the bat algorithm can have global convergence. In addition, in order to enhance the convergence performance of the algorithm, we have designed an updated model using the dynamical system theory in terms of a dynamic matrix, and the parameter ranges for the algorithm stability are then obtained. We then use some benchmark functions to demonstrate that BA can indeed achieve global optimality efficiently for these functions.

* Expert Systems with Applications, vol. 114, 173--182 (2018) 
* 17 pages, 3 figures 

  Click for Model/Code and Paper
Active Deep Q-learning with Demonstration

Dec 06, 2018
Si-An Chen, Voot Tangkaratt, Hsuan-Tien Lin, Masashi Sugiyama

Recent research has shown that although Reinforcement Learning (RL) can benefit from expert demonstration, it usually takes considerable efforts to obtain enough demonstration. The efforts prevent training decent RL agents with expert demonstration in practice. In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training. Under the framework, we propose Active Deep Q-Network, a novel query strategy which adapts to the dynamically-changing distributions during the RL training process by estimating the uncertainty of recent states. The expert demonstration data within Active DQN are then utilized by optimizing supervised max-margin loss in addition to temporal difference loss within usual DQN training. We propose two methods of estimating the uncertainty based on two state-of-the-art DQN models, namely the divergence of bootstrapped DQN and the variance of noisy DQN. The empirical results validate that both methods not only learn faster than other passive expert demonstration methods with the same amount of demonstration and but also reach super-expert level of performance across four different tasks.


  Click for Model/Code and Paper
Multi-label Learning Based Deep Transfer Neural Network for Facial Attribute Classification

May 03, 2018
Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang, Chunhua Shen

Deep Neural Network (DNN) has recently achieved outstanding performance in a variety of computer vision tasks, including facial attribute classification. The great success of classifying facial attributes with DNN often relies on a massive amount of labelled data. However, in real-world applications, labelled data are only provided for some commonly used attributes (such as age, gender); whereas, unlabelled data are available for other attributes (such as attraction, hairline). To address the above problem, we propose a novel deep transfer neural network method based on multi-label learning for facial attribute classification, termed FMTNet, which consists of three sub-networks: the Face detection Network (FNet), the Multi-label learning Network (MNet) and the Transfer learning Network (TNet). Firstly, based on the Faster Region-based Convolutional Neural Network (Faster R-CNN), FNet is fine-tuned for face detection. Then, MNet is fine-tuned by FNet to predict multiple attributes with labelled data, where an effective loss weight scheme is developed to explicitly exploit the correlation between facial attributes based on attribute grouping. Finally, based on MNet, TNet is trained by taking advantage of unsupervised domain adaptation for unlabelled facial attribute classification. The three sub-networks are tightly coupled to perform effective facial attribute classification. A distinguishing characteristic of the proposed FMTNet method is that the three sub-networks (FNet, MNet and TNet) are constructed in a similar network structure. Extensive experimental results on challenging face datasets demonstrate the effectiveness of our proposed method compared with several state-of-the-art methods.


  Click for Model/Code and Paper
Going Wider: Recurrent Neural Network With Parallel Cells

May 03, 2017
Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen

Recurrent Neural Network (RNN) has been widely applied for sequence modeling. In RNN, the hidden states at current step are full connected to those at previous step, thus the influence from less related features at previous step may potentially decrease model's learning ability. We propose a simple technique called parallel cells (PCs) to enhance the learning ability of Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells rather than one single large cell. In this paper, we evaluate PCs on 2 tasks. On language modeling task on PTB (Penn Tree Bank), our model outperforms state of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English translation task, our model increases BLEU score for 0.39 points than baseline model.


  Click for Model/Code and Paper
Quadratic Projection Based Feature Extraction with Its Application to Biometric Recognition

Mar 25, 2016
Yan Yan, Hanzi Wang, Si Chen, Xiaochun Cao, David Zhang

This paper presents a novel quadratic projection based feature extraction framework, where a set of quadratic matrices is learned to distinguish each class from all other classes. We formulate quadratic matrix learning (QML) as a standard semidefinite programming (SDP) problem. However, the con- ventional interior-point SDP solvers do not scale well to the problem of QML for high-dimensional data. To solve the scalability of QML, we develop an efficient algorithm, termed DualQML, based on the Lagrange duality theory, to extract nonlinear features. To evaluate the feasibility and effectiveness of the proposed framework, we conduct extensive experiments on biometric recognition. Experimental results on three representative biometric recogni- tion tasks, including face, palmprint, and ear recognition, demonstrate the superiority of the DualQML-based feature extraction algorithm compared to the current state-of-the-art algorithms


  Click for Model/Code and Paper
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection

Dec 30, 2019
Yue Liao, Si Liu, Fei Wang, Yanjie Chen, Jiashi Feng

We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps on a single Titan XP GPU. It is the first real-time HOI detection method. Conventional HOI detection methods are composed of two stages, i.e., human-object proposals generation, and proposals classification. Their effectiveness and efficiency are limited by the sequential and separate architecture. In this paper, we propose a Parallel Point Detection and Matching (PPDM) HOI detection framework. In PPDM, an HOI is defined as a point triplet < human point, interaction point, object point>. Human and object points are the center of the detection boxes, and the interaction point is the midpoint of the human and object points. PPDM contains two parallel branches, namely point detection branch and point matching branch. The point detection branch predicts three points. Simultaneously, the point matching branch predicts two displacements from the interaction point to its corresponding human and object points. The human point and the object point originated from the same interaction point are considered as matched pairs. In our novel parallel architecture, the interaction points implicitly provide context and regularization for human and object detection. The isolated detection boxes are unlikely to form meaning HOI triplets are suppressed, which increases the precision of HOI detection. Moreover, the matching between human and object detection boxes is only applied around limited numbers of filtered candidate interaction points, which saves much computational cost. Additionally, we build a new applicationoriented database named HOI-A, which severs as a good supplement to the existing datasets. The source code and the dataset will be made publicly available to facilitate the development of HOI detection.

* Tech Report 

  Click for Model/Code and Paper
Distraction-Based Neural Networks for Document Summarization

Oct 26, 2016
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e.g., documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.

* IJCAI, 2016 
* Published in IJCAI-2016: the 25th International Joint Conference on Artificial Intelligence 

  Click for Model/Code and Paper
Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference

Apr 27, 2019
Tianda Li, Xiaodan Zhu, Quan Liu, Qian Chen, Zhigang Chen, Si Wei

Natural language inference (NLI) is among the most challenging tasks in natural language understanding. Recent work on unsupervised pretraining that leverages unsupervised signals such as language-model and sentence prediction objectives has shown to be very effective on a wide range of NLP problems. It would still be desirable to further understand how it helps NLI; e.g., if it learns artifacts in data annotation or instead learn true inference knowledge. In addition, external knowledge that does not exist in the limited amount of NLI training data may be added to NLI models in two typical ways, e.g., from human-created resources or an unsupervised pretraining paradigm. We runs several experiments here to investigate whether they help NLI in the same way, and if not,how?


  Click for Model/Code and Paper
Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Oct 29, 2018
Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh

Neural language models have been widely used in various NLP tasks, including machine translation, next word prediction and conversational agents. However, it is challenging to deploy these models on mobile devices due to their slow prediction speed, where the bottleneck is to compute top candidates in the softmax layer. In this paper, we introduce a novel softmax layer approximation algorithm by exploiting the clustering structure of context vectors. Our algorithm uses a light-weight screening model to predict a much smaller set of candidate words based on the given context, and then conducts an exact softmax only within that subset. Training such a procedure end-to-end is challenging as traditional clustering methods are discrete and non-differentiable, and thus unable to be used with back-propagation in the training process. Using the Gumbel softmax, we are able to train the screening model end-to-end on the training set to exploit data distribution. The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction. For example, for machine translation task on German to English dataset with around 25K vocabulary, we can achieve 20.4 times speed up with 98.9\% precision@1 and 99.3\% precision@5 with the original softmax layer prediction, while state-of-the-art ~\citep{MSRprediction} only achieves 6.7x speedup with 98.7\% precision@1 and 98.1\% precision@5 for the same task.


  Click for Model/Code and Paper
Neural Natural Language Inference Models Enhanced with External Knowledge

Jun 23, 2018
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei

Modeling natural language inference is a very challenging task. With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform natural language inference (NLI) from these data? If not, how can neural-network-based NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models improve neural NLI models to achieve the state-of-the-art performance on the SNLI and MultiNLI datasets.

* Accepted by ACL 2018 

  Click for Model/Code and Paper
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Jun 18, 2018
Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-jui Hsieh

Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. As a case study, a state-of-the-art neural language model usually consists of one or more recurrent layers sandwiched between an embedding layer used for representing input tokens and a softmax layer for generating output tokens. For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves state-of- the-art performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words). The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. On the OBW dataset, our method achieved 6.6 times compression rate for the embedding and softmax matrices, and when combined with quantization, our method can achieve 26 times compression rate, which translates to a factor of 12.8 times compression for the entire model with very little degradation in perplexity.


  Click for Model/Code and Paper