Models, code, and papers for "Xiaodong Liu":

Multi-Task Deep Neural Networks for Natural Language Understanding

Jan 31, 2019
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from a regularization effect that leads to more general representations in order to adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating a pre-trained bidirectional transformer language model, known as BERT (Devlin et al., 2018). MT-DNN obtains new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.2% (1.8% absolute improvement). We also demonstrate using the SNLI and SciTail datasets that the representations learned by MT-DNN allow domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. Our code and pre-trained models will be made publicly available.

* 10 pages, 2 figures and 5 tables 

  Click for Model/Code and Paper
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

Apr 20, 2019
Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Although ensemble learning can improve model performance, serving an ensemble of large DNNs such as MT-DNN can be prohibitively expensive. Here we apply the knowledge distillation method (Hinton et al., 2015) in the multi-task learning setting. For each task, we train an ensemble of different MT-DNNs (teacher) that outperforms any single model, and then train a single MT-DNN (student) via multi-task learning to \emph{distill} knowledge from these ensemble teachers. We show that the distilled MT-DNN significantly outperforms the original MT-DNN on 7 out of 9 GLUE tasks, pushing the GLUE benchmark (single model) to 83.7\% (1.5\% absolute improvement\footnote{ Based on the GLUE leaderboard at https://gluebenchmark.com/leaderboard as of April 1, 2019.}). The code and pre-trained models will be made publicly available at https://github.com/namisan/mt-dnn.

* 8 pages, 2 figures and 3 tables 

  Click for Model/Code and Paper
Nonconvex Rectangular Matrix Completion via Gradient Descent without $\ell_{2,\infty}$ Regularization

Jan 18, 2019
Ji Chen, Dekai Liu, Xiaodong Li

The analysis of nonconvex matrix completion has recently attracted much attention in the community of machine learning thanks to its computational convenience. Existing analysis on this problem, however, usually relies on $\ell_{2,\infty}$ projection or regularization that involves unknown model parameters, although they are observed to be unnecessary in numerical simulations, see, e.g. Zheng and Lafferty [2016]. In this paper, we extend the analysis of the vanilla gradient descent for positive semidefinite matrix completion proposed in Ma et al. [2017] to the rectangular case, and more significantly, improve the required sampling complexity from $\widetilde{O}(r^3)$ to $\widetilde{O}(r^2)$. Our technical ideas and contributions are potentially useful in improving the leave-one-out analysis in other related problems.


  Click for Model/Code and Paper
Stochastic Answer Networks for Natural Language Inference

Apr 21, 2018
Xiaodong Liu, Kevin Duh, Jianfeng Gao

We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference. Rather than directly predicting the results given the inputs, the model maintains a state and iteratively refines its predictions. Our experiments show that SAN achieves the state-of-the-art results on three benchmarks: Stanford Natural Language Inference (SNLI) dataset, MultiGenre Natural Language Inference (MultiNLI) dataset and Quora Question Pairs dataset.

* 5 pages, 1 figures 

  Click for Model/Code and Paper
MLFcGAN: Multi-level Feature Fusion based Conditional GAN for Underwater Image Color Correction

Feb 13, 2020
Xiaodong Liu, Zhi Gao, Ben M. Chen

Color correction for underwater images has received increasing interests, due to its critical role in facilitating available mature vision algorithms for underwater scenarios. Inspired by the stunning success of deep convolutional neural networks (DCNNs) techniques in many vision tasks, especially the strength in extracting features in multiple scales, we propose a deep multi-scale feature fusion net based on the conditional generative adversarial network (GAN) for underwater image color correction. In our network, multi-scale features are extracted first, followed by augmenting local features on each scale with global features. This design was verified to facilitate more effective and faster network learning, resulting in better performance in both color correction and detail preservation. We conducted extensive experiments and compared with the state-of-the-art approaches quantitatively and qualitatively, showing that our method achieves significant improvements.

* This paper has already been accepted to journal IEEE geoscience and remote sensing letters 

  Click for Model/Code and Paper
A Hybrid Neural Network Model for Commonsense Reasoning

Jul 27, 2019
Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

This paper proposes a hybrid neural network (HNN) model for commonsense reasoning. An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers. HNN obtains new state-of-the-art results on three classic commonsense reasoning tasks, pushing the WNLI benchmark to 89%, the Winograd Schema Challenge (WSC) benchmark to 75.1%, and the PDP60 benchmark to 90.0%. An ablation study shows that language models and semantic similarity models are complementary approaches to commonsense reasoning, and HNN effectively combines the strengths of both. The code and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.

* 9 pages, 3 figures, 6 tables 

  Click for Model/Code and Paper
Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks

Jul 27, 2019
Ming Zhu, Xiao-Yang Liu, Xiaodong Wang

Unmanned aerial vehicles (UAVs) are envisioned to complement the 5G communication infrastructure in future smart cities. Hot spots easily appear in road intersections, where effective communication among vehicles is challenging. UAVs may serve as relays with the advantages of low price, easy deployment, line-of-sight links, and flexible mobility. In this paper, we study a UAV-assisted vehicular network where the UAV jointly adjusts its transmission power and bandwidth allocation under 3D flight to maximize the total throughput. First, we formulate a Markov Decision Process (MDP) problem by modeling the mobility of the UAV/vehicles and the state transitions. Secondly, we solve the target problem using a deep reinforcement learning method, namely, the deep deterministic policy gradient, and propose three solutions with different control objectives. Then we extend the proposed solutions by considering the energy consumption of 3D flight. Thirdly, in a simplified model with small state space and action space, we verify the optimality of proposed algorithms. Comparing with two baseline schemes, we demonstrate the effectiveness of proposed algorithms in a realistic model.

* 13 pages, 13 figures 

  Click for Model/Code and Paper
Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks in Smart Cities

Jul 09, 2019
Ming Zhu, Xiao-Yang Liu, Xiaodong Wang

Unmanned aerial vehicles (UAVs) are envisioned to complement the 5G communication infrastructure in future smart cities. Hot spots easily appear in road intersections, where effective communication among vehicles is challenging. UAVs may serve as relays with the advantages of low price, easy deployment, line-of-sight links, and flexible mobility. In this paper, we study a UAV-assisted vehicular network where the UAV jointly adjusts its transmission power and bandwidth allocation under 3D flight to maximize the total throughput. First, we formulate a Markov Decision Process (MDP) problem by modeling the mobility of the UAV/vehicles and the state transitions. Secondly, we solve the target problem using a deep reinforcement learning method, namely, the deep deterministic policy gradient, and propose three solutions with different control objectives. Then we extend the proposed solutions considering of the energy consumption of 3D flight. Thirdly, in a simplified model with small state space and action space, we verify the optimality of proposed algorithms. Comparing with two baseline schemes, we demonstrate the effectiveness of proposed algorithms in a realistic model.

* 12 pages, 13 figures 

  Click for Model/Code and Paper
An Online Ride-Sharing Path Planning Strategy for Public Vehicle Systems

Dec 27, 2017
Ming Zhu, Xiao-Yang Liu, Xiaodong Wang

As efficient traffic-management platforms, public vehicle (PV) systems are envisioned to be a promising approach to solving traffic congestions and pollutions for future smart cities. PV systems provide online/dynamic peer-to-peer ride-sharing services with the goal of serving sufficient number of customers with minimum number of vehicles and lowest possible cost. A key component of the PV system is the online ride-sharing scheduling strategy. In this paper, we propose an efficient path planning strategy that focuses on a limited potential search area for each vehicle by filtering out the requests that violate passenger service quality level, so that the global search is reduced to local search. We analyze the performance of the proposed solution such as reduction ratio of computational complexity. Simulations based on the Manhattan taxi data set show that, the computing time is reduced by 22% compared with the exhaustive search method under the same service quality performance.

* 12 pages 

  Click for Model/Code and Paper
Multi-Modality Cascaded Fusion Technology for Autonomous Driving

Feb 08, 2020
Hongwu Kuang, Xiaodong Liu, Jingwei Zhang, Zicheng Fang

Multi-modality fusion is the guarantee of the stability of autonomous driving systems. In this paper, we propose a general multi-modality cascaded fusion framework, exploiting the advantages of decision-level and feature-level fusion, utilizing target position, size, velocity, appearance and confidence to achieve accurate fusion results. In the fusion process, dynamic coordinate alignment(DCA) is conducted to reduce the error between sensors from different modalities. In addition, the calculation of affinity matrix is the core module of sensor fusion, we propose an affinity loss that improves the performance of deep affinity network(DAN). Last, the proposed step-by-step cascaded fusion framework is more interpretable and flexible compared to the end-toend fusion methods. Extensive experiments on Nuscenes [2] dataset show that our approach achieves the state-of-theart performance.dataset show that our approach achieves the state-of-the-art performance.


  Click for Model/Code and Paper
Stochastic Answer Networks for Machine Reading Comprehension

May 15, 2018
Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao

We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehension Dataset (MS MARCO).

* 11 pages, 5 figures, Accepted to ACL 2018 

  Click for Model/Code and Paper
An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

Nov 09, 2017
Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao

Reading comprehension (RC) is a challenging task that requires synthesis of information across sentences and multiple turns of reasoning. Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets. The RC model is an end-to-end neural network with iterative attention, and uses reinforcement learning to dynamically control the number of turns. We find that multiple-turn reasoning outperforms single-turn reasoning for all question and answer types; further, we observe that enabling a flexible number of turns generally improves upon a fixed multiple-turn strategy. %across all question types, and is particularly beneficial to questions with lengthy, descriptive answers. We achieve results competitive to the state-of-the-art on these two datasets.


  Click for Model/Code and Paper
Implicit Discourse Relation Classification via Multi-Task Neural Networks

Mar 09, 2016
Yang Liu, Sujian Li, Xiaodong Zhang, Zhifang Sui

Without discourse connectives, classifying implicit discourse relations is a challenging task and a bottleneck for building a practical discourse parser. Previous research usually makes use of one kind of discourse framework such as PDTB or RST to improve the classification performance on discourse relations. Actually, under different discourse annotation frameworks, there exist multiple corpora which have internal connections. To exploit the combination of different discourse corpora, we design related discourse classification tasks specific to a corpus, and propose a novel Convolutional Neural Network embedded multi-task learning system to synthesize these tasks by learning both unique and shared representations for each task. The experimental results on the PDTB implicit discourse relation classification task demonstrate that our model achieves significant gains over baseline systems.

* This is the pre-print version of a paper accepted by AAAI-16 

  Click for Model/Code and Paper
Attentive Tensor Product Learning

Nov 01, 2018
Qiuyuan Huang, Li Deng, Dapeng Wu, Chang Liu, Xiaodong He

This paper proposes a new architecture - Attentive Tensor Product Learning (ATPL) - to represent grammatical structures in deep learning models. ATPL is a new architecture to bridge this gap by exploiting Tensor Product Representations (TPR), a structured neural-symbolic model developed in cognitive science, aiming to integrate deep learning with explicit language structures and rules. The key ideas of ATPL are: 1) unsupervised learning of role-unbinding vectors of words via TPR-based deep neural network; 2) employing attention modules to compute TPR; and 3) integration of TPR with typical deep learning architectures including Long Short-Term Memory (LSTM) and Feedforward Neural Network (FFNN). The novelty of our approach lies in its ability to extract the grammatical structure of a sentence by using role-unbinding vectors, which are obtained in an unsupervised manner. This ATPL approach is applied to 1) image captioning, 2) part of speech (POS) tagging, and 3) constituency parsing of a sentence. Experimental results demonstrate the effectiveness of the proposed approach.


  Click for Model/Code and Paper
Multi-Task Learning for Machine Reading Comprehension

Sep 18, 2018
Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao

We propose a multi-task learning framework to jointly train a Machine Reading Comprehension (MRC) model on multiple datasets across different domains. Key to the proposed method is to learn robust and general contextual representations with the help of out-domain data in a multi-task framework. Empirical study shows that the proposed approach is orthogonal to the existing pre-trained representation models, such as word embedding and language models. Experiments on the Stanford Question Answering Dataset (SQuAD), the Microsoft MAchine Reading COmprehension Dataset (MS MARCO), NewsQA and other datasets show that our multi-task learning approach achieves significant improvement over state-of-the-art models in most MRC tasks.

* 9 pages, 2 figures, 7 tables 

  Click for Model/Code and Paper
Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation

Sep 14, 2018
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz

We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models and training. First, we design a compact but effective CNN model, called PWC-Net, according to simple and well-established principles: pyramidal processing, warping, and cost volume processing. PWC-Net is 17 times smaller in size, 2 times faster in inference, and 11\% more accurate on Sintel final than the recent FlowNet2 model. It is the winning entry in the optical flow competition of the robust vision challenge. Next, we experimentally analyze the sources of our performance gains. In particular, we use the same training procedure of PWC-Net to retrain FlowNetC, a sub-network of FlowNet2. The retrained FlowNetC is 56\% more accurate on Sintel final than the previously trained one and even 5\% more accurate than the FlowNet2 model. We further improve the training procedure and increase the accuracy of PWC-Net on Sintel by 10\% and on KITTI 2012 and 2015 by 20\%. Our newly trained model parameters and training protocols will be available on https://github.com/NVlabs/PWC-Net


  Click for Model/Code and Paper
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

Jun 25, 2018
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz

We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our models are available on https://github.com/NVlabs/PWC-Net.

* CVPR 2018 camera ready version (with github link to Caffe and PyTorch code) 

  Click for Model/Code and Paper