Models, code, and papers for "Wenxuan Wang":

Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Nov 13, 2018
Hao Luo, Wenxuan Xie, Xinggang Wang, Wenjun Zeng

State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised -- how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping -- detecting every N-th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on ImageNet VID dataset in video object detection/tracking.

* Accepted to AAAI 2019 

  Click for Model/Code and Paper
PointPWC-Net: A Coarse-to-Fine Network for Supervised and Self-Supervised Scene Flow Estimation on 3D Point Clouds

Nov 27, 2019
Wenxuan Wu, Zhiyuan Wang, Zhuwen Li, Wei Liu, Li Fuxin

We propose a novel end-to-end deep scene flow model, called PointPWC-Net, on 3D point clouds in a coarse-to-fine fashion. Flow computed at the coarse level is upsampled and warped to a finer level, enabling the algorithm to accommodate for large motion without a prohibitive search space. We introduce novel cost volume, upsampling, and warping layers to efficiently handle 3D point cloud data. Unlike traditional cost volumes that require exhaustively computing all the cost values on a high-dimensional grid, our point-based formulation discretizes the cost volume onto input 3D points, and a PointConv operation efficiently computes convolutions on the cost volume. Experiment results on FlyingThings3D outperform the state-of-the-art by a large margin. We further explore novel self-supervised losses to train our model and achieve comparable results to state-of-the-art trained with supervised loss. Without any fine-tuning, our method also shows great generalization ability on KITTI Scene Flow 2015 dataset, outperforming all previous methods.


  Click for Model/Code and Paper
Neural Rule Grounding for Low-Resource Relation Extraction

Sep 20, 2019
Wenxuan Zhou, Hongtao Lin, Ziqi Wang, Leonardo Neves, Xiang Ren

While deep neural models have gained successes on information extraction tasks, they become less reliable when the amount of labeled data is limited. In this paper, we study relation extraction (RE) under low-resource setting, where only some (hand-built) labeling rules are provided for learning a neural model over a large, unlabeled corpus. To overcome the low-coverage issue of current bootstrapping methods (i.e., hard grounding of rules), we propose a Neural Rule Grounding (REGD) framework for jointly learning a relation extraction module (with flexible neural architecture) and a sentence-rule soft matching module. The soft matching module extends the coverage of rules on semantically similar instances and augments the learning on unlabeled corpus. Experiments on two public datasets demonstrate the effectiveness of REGD when compared with both rule-based and semi-supervised baselines. Additionally, the learned soft matching module is able to predict on new relations with unseen rules, and can provide interpretation on matching results.


  Click for Model/Code and Paper
Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning

Nov 10, 2018
Chao Yu, Tianpei Yang, Wenxuan Zhu, Dongxu wang, Guangliang Li

Providing reinforcement learning agents with informationally rich human knowledge can dramatically improve various aspects of learning. Prior work has developed different kinds of shaping methods that enable agents to learn efficiently in complex environments. All these methods, however, tailor human guidance to agents in specialized shaping procedures, thus embodying various characteristics and advantages in different domains. In this paper, we investigate the interplay between different shaping methods for more robust learning performance. We propose an adaptive shaping algorithm which is capable of learning the most suitable shaping method in an on-line manner. Results in two classic domains verify its effectiveness from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning.


  Click for Model/Code and Paper
Learning to Annotate: Modularizing Data Augmentation for Text Classifiers with Natural Language Explanations

Nov 07, 2019
Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

Deep neural networks usually require massive labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data over new instances, while the annotation time only doubles. However, directly applying them for augmenting model learning encounters two challenges: (1) NL explanations are unstructured and inherently compositional. (2) NL explanations often have large numbers of linguistic variants, resulting in low recall and limited generalization ability. In this paper, we propose a novel Neural EXecution Tree (NEXT) framework to augment training data for text classification using NL explanations. After transforming NL explanations into executable logical forms by semantic parsing, NEXT generalizes different types of actions specified by the logical forms for labeling data instances, which substantially increases the coverage of each NL explanation. Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort.

* 16 pages, 7 figures, 13 tables 

  Click for Model/Code and Paper
Learning to Annotate: Modularizing Data Augmentation for TextClassifiers with Natural Language Explanations

Nov 04, 2019
Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, Xiang Ren

Deep neural networks usually require massive labeled data, which restricts their applications in scenarios where data annotation is expensive. Natural language (NL) explanations have been demonstrated very useful additional supervision, which can provide sufficient domain knowledge for generating more labeled data over new instances, while the annotation time only doubles. However, directly applying them for augmenting model learning encounters two challenges: (1) NL explanations are unstructured and inherently compositional. (2) NL explanations often have large numbers of linguistic variants, resulting in low recall and limited generalization ability. In this paper, we propose a novel Neural EXecution Tree (NEXT) framework to augment training data for text classification using NL explanations. After transforming NL explanations into executable logical forms by semantic parsing, NEXT generalizes different types of actions specified by the logical forms for labeling data instances, which substantially increases the coverage of each NL explanation. Experiments on two NLP tasks (relation extraction and sentiment analysis) demonstrate its superiority over baseline methods. Its extension to multi-hop question answering achieves performance gain with light annotation effort.

* 16 pages, 7 figures, 13 tables 

  Click for Model/Code and Paper
A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition

Jul 25, 2019
Wenxuan Wang, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng, Guoqiang Xu, Han Qiu, Yanwei Fu

The recent research of facial expression recognition has made a lot of progress due to the development of deep learning technologies, but some typical challenging problems such as the variety of rich facial expressions and poses are still not resolved. To solve these problems, we develop a new Facial Expression Recognition (FER) framework by involving the facial poses into our image synthesizing and classification process. There are two major novelties in this work. First, we create a new facial expression dataset of more than 200k images with 119 persons, 4 poses and 54 expressions. To our knowledge this is the first dataset to label faces with subtle emotion changes for expression recognition purpose. It is also the first dataset that is large enough to validate the FER task on unbalanced poses, expressions, and zero-shot subject IDs. Second, we propose a facial pose generative adversarial network (FaPE-GAN) to synthesize new facial expression images to augment the data set for training purpose, and then learn a LightCNN based Fa-Net model for expression classification. Finally, we advocate four novel learning tasks on this dataset. The experimental results well validate the effectiveness of the proposed approach.

* 10 pages, 8 figures 

  Click for Model/Code and Paper
Pose-Normalized Image Generation for Person Re-identification

Apr 25, 2018
Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, Xiangyang Xue

Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations. In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on the pose. The model is based on a generative adversarial network (GAN) designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN). With the synthesized images, we can learn a new type of deep re-id feature free of the influence of pose variations. We show that this feature is strong on its own and complementary to features learned with the original images. Importantly, under the transfer learning setting, we show that our model generalizes well to any new re-id dataset without the need for collecting any training data for model fine-tuning. The model thus has the potential to make re-id model truly scalable.

* 10 pages, 5 figures 

  Click for Model/Code and Paper