Models, code, and papers for "Yang Liu":

##### Beyond The Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres

Sep 02, 2019
Yang Liu

Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.

* 10 pages. In Proceedings of 7th Workshop on Discourse Relation Parsing and Treebanking (DISRPT) at NAACL-HLT 2019, Minneapolis, MN
##### Intelligent Processing in Vehicular Ad hoc Networks: a Survey

Mar 28, 2019
Yang Liu

The intelligent Processing technique is more and more attractive to researchers due to its ability to deal with key problems in Vehicular Ad hoc networks. However, several problems in applying intelligent processing technologies in VANETs remain open. The existing applications are comprehensively reviewed and discussed, and classified into different categories in this paper. Their strategies, advantages/disadvantages, and performances are elaborated. By generalizing different tactics in various applications related to different scenarios of VANETs and evaluating their performances, several promising directions for future research have been suggested.

* 11pages, 5 figures
##### Fine-tune BERT for Extractive Summarization

Mar 25, 2019
Yang Liu

BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at https://github.com/nlpyang/BertSum

##### Research on the Brain-inspired Cross-modal Neural Cognitive Computing Framework

May 31, 2018
Yang Liu

To address modeling problems of brain-inspired intelligence, this thesis is focused on researching in the semantic-oriented framework design for multimedia and multimodal information. The Multimedia Neural Cognitive Computing (MNCC) model was designed based on the nervous mechanism and cognitive architecture. Furthermore, the semantic-oriented hierarchical Cross-modal Neural Cognitive Computing (CNCC) framework was proposed based on MNCC model, and formal description and analysis for CNCC framework was given. It would effectively improve the performance of semantic processing for multimedia and cross-modal information, and has far-reaching significance for exploration and realization brain-inspired computing.

##### DP-LSTM: Differential Privacy-inspired LSTM for Stock Prediction Using Financial News

Dec 20, 2019
Xinyi Li, Yinchuan Li, Hongyang Yang, Liuqing Yang, Xiao-Yang Liu

Stock price prediction is important for value investments in the stock market. In particular, short-term prediction that exploits financial news articles is promising in recent years. In this paper, we propose a novel deep neural network DP-LSTM for stock price prediction, which incorporates the news articles as hidden information and integrates difference news sources through the differential privacy mechanism. First, based on the autoregressive moving average model (ARMA), a sentiment-ARMA is formulated by taking into consideration the information of financial news articles in the model. Then, an LSTM-based deep neural network is designed, which consists of three components: LSTM, VADER model and differential privacy (DP) mechanism. The proposed DP-LSTM scheme can reduce prediction errors and increase the robustness. Extensive experiments on S&P 500 stocks show that (i) the proposed DP-LSTM achieves 0.32% improvement in mean MPA of prediction result, and (ii) for the prediction of the market index S&P 500, we achieve up to 65.79% improvement in MSE.

* arXiv admin note: text overlap with arXiv:1908.01112
##### Practical Deep Reinforcement Learning Approach for Stock Trading

Dec 02, 2018
Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, Anwar Walid

Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.

##### Learning to Teach

May 09, 2018
Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations. A good teacher will select appropriate teaching materials, impact suitable methodologies, and set up targeted examinations, according to the learning behaviors of the students. In the field of artificial intelligence, however, one has not fully explored the role of teaching, and pays most attention to machine \emph{learning}. In this paper, we argue that equal attention, if not more, should be paid to teaching, and furthermore, an optimization framework (instead of heuristics) should be used to obtain good teaching strategies. We call this approach learning to teach'. In the approach, two intelligent agents interact with each other: a student model (which corresponds to the learner in traditional machine learning algorithms), and a teacher model (which determines the appropriate data, loss function, and hypothesis space to facilitate the training of the student model). The teacher model leverages the feedback from the student model to optimize its own teaching strategies by means of reinforcement learning, so as to achieve teacher-student co-evolution. To demonstrate the practical value of our proposed approach, we take the training of deep neural networks (DNN) as an example, and show that by using the learning to teach techniques, we are able to use much less training data and fewer iterations to achieve almost the same accuracy for different kinds of DNN models (e.g., multi-layer perceptron, convolutional neural networks and recurrent neural networks) under various machine learning tasks (e.g., image classification and text understanding).

* ICLR 2018
##### A Causal Perspective to Unbiased Conversion Rate Estimation on Data Missing Not at Random

In modern e-commerce and advertising recommender systems, ongoing research works attempt to optimize conversion rate (CVR) estimation, and increase the gross merchandise volume. Even though the state-of-the-art CVR estimators adopt deep learning methods, their model performances are still subject to sample selection bias and data sparsity issues. Conversion labels of exposed items in training dataset are typically missing not at random due to selection bias. Empirically, data sparsity issue causes the performance degradation of model with large parameter space. In this paper, we proposed two causal estimators combined with multi-task learning, and aim to solve sample selection bias (SSB) and data sparsity (DS) issues in conversion rate estimation. The proposed estimators adjust for the MNAR mechanism as if they are trained on a "do dataset" where users are forced to click on all exposed items. We evaluate the causal estimators with billion data samples. Experiment results demonstrate that the proposed CVR estimators outperform other state-of-the-art CVR estimators. In addition, empirical study shows that our methods are cost-effective with large scale dataset.

* 16 pages, 6 figures, 4 tables
##### Information Scaling Law of Deep Neural Networks

Feb 13, 2018
Xiao-Yang Liu

With the rapid development of Deep Neural Networks (DNNs), various network models that show strong computing power and impressive expressive power are proposed. However, there is no comprehensive informational interpretation of DNNs from the perspective of information theory. Due to the nonlinear function and the uncertain number of layers and neural units used in the DNNs, the network structure shows nonlinearity and complexity. With the typical DNNs named Convolutional Arithmetic Circuits (ConvACs), the complex DNNs can be converted into mathematical formula. Thus, we can use rigorous mathematical theory especially the information theory to analyse the complicated DNNs. In this paper, we propose a novel information scaling law scheme that can interpret the network's inner organization by information theory. First, we show the informational interpretation of the activation function. Secondly, we prove that the information entropy increases when the information is transmitted through the ConvACs. Finally, we propose the information scaling law of ConvACs through making a reasonable assumption.

* 7 pages, 5 figures
##### Exploiting Unlabeled Data for Neural Grammatical Error Detection

Nov 29, 2016
Zhuoran Liu, Yang Liu

Identifying and correcting grammatical errors in the text written by non-native writers has received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVMs and convolutional networks with fixed-size context window.

##### An Online Approach to Dynamic Channel Access and Transmission Scheduling

Apr 04, 2015
Yang Liu, Mingyan Liu

Making judicious channel access and transmission scheduling decisions is essential for improving performance as well as energy and spectral efficiency in multichannel wireless systems. This problem has been a subject of extensive study in the past decade, and the resulting dynamic and opportunistic channel access schemes can bring potentially significant improvement over traditional schemes. However, a common and severe limitation of these dynamic schemes is that they almost always require some form of a priori knowledge of the channel statistics. A natural remedy is a learning framework, which has also been extensively studied in the same context, but a typical learning algorithm in this literature seeks only the best static policy, with performance measured by weak regret, rather than learning a good dynamic channel access policy. There is thus a clear disconnect between what an optimal channel access policy can achieve with known channel statistics that actively exploits temporal, spatial and spectral diversity, and what a typical existing learning algorithm aims for, which is the static use of a single channel devoid of diversity gain. In this paper we bridge this gap by designing learning algorithms that track known optimal or sub-optimal dynamic channel access and transmission scheduling policies, thereby yielding performance measured by a form of strong regret, the accumulated difference between the reward returned by an optimal solution when a priori information is available and that by our online algorithm. We do so in the context of two specific algorithms that appeared in [1] and [2], respectively, the former for a multiuser single-channel setting and the latter for a single-user multichannel setting. In both cases we show that our algorithms achieve sub-linear regret uniform in time and outperforms the standard weak-regret learning algorithms.

* 10 pages, to appear in MobiHoc 2015
##### Group Learning and Opinion Diffusion in a Broadcast Network

Sep 14, 2013
Yang Liu, Mingyan Liu

We analyze the following group learning problem in the context of opinion diffusion: Consider a network with $M$ users, each facing $N$ options. In a discrete time setting, at each time step, each user chooses $K$ out of the $N$ options, and receive randomly generated rewards, whose statistics depend on the options chosen as well as the user itself, and are unknown to the users. Each user aims to maximize their expected total rewards over a certain time horizon through an online learning process, i.e., a sequence of exploration (sampling the return of each option) and exploitation (selecting empirically good options) steps. Within this context we consider two group learning scenarios, (1) users with uniform preferences and (2) users with diverse preferences, and examine how a user should construct its learning process to best extract information from other's decisions and experiences so as to maximize its own reward. Performance is measured in {\em weak regret}, the difference between the user's total reward and the reward from a user-specific best single-action policy (i.e., always selecting the set of options generating the highest mean rewards for this user). Within each scenario we also consider two cases: (i) when users exchange full information, meaning they share the actual rewards they obtained from their choices, and (ii) when users exchange limited information, e.g., only their choices but not rewards obtained from these choices.

##### Randomized Wagering Mechanisms

Sep 17, 2018
Yiling Chen, Yang Liu, Juntao Wang

Wagering mechanisms are one-shot betting mechanisms that elicit agents' predictions of an event. For deterministic wagering mechanisms, an existing impossibility result has shown incompatibility of some desirable theoretical properties. In particular, Pareto optimality (no profitable side bet before allocation) can not be achieved together with weak incentive compatibility, weak budget balance and individual rationality. In this paper, we expand the design space of wagering mechanisms to allow randomization and ask whether there are randomized wagering mechanisms that can achieve all previously considered desirable properties, including Pareto optimality. We answer this question positively with two classes of randomized wagering mechanisms: i) one simple randomized lottery-type implementation of existing deterministic wagering mechanisms, and ii) another family of simple and randomized wagering mechanisms which we call surrogate wagering mechanisms, which are robust to noisy ground truth. This family of mechanisms builds on the idea of learning with noisy labels (Natarajan et al. 2013) as well as a recent extension of this idea to the information elicitation without verification setting (Liu and Chen 2018). We show that a broad family of randomized wagering mechanisms satisfy all desirable theoretical properties.

##### Very Long Natural Scenery Image Prediction by Outpainting

Dec 29, 2019
Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.

* ICCV-19
##### A Neural Approach to Discourse Relation Signal Detection

Jan 08, 2020
Amir Zeldes, Yang Liu

Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, {\Delta}s (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.

* D&D
##### Causally Denoise Word Embeddings Using Half-Sibling Regression

Nov 24, 2019
Zekun Yang, Tianlin Liu

Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), which allows us to identify and remove confounding noise contained in word vectors. Compared to previous work, our proposed method has the advantages of interpretability and transparency due to its causal inference grounding. Evaluated on a battery of standard lexical-level evaluation tasks and downstream sentiment analysis tasks, our method reaches state-of-the-art performance.

* Accepted by AAAI 2020
##### A GMM based algorithm to generate point-cloud and its application to neuroimaging

Nov 05, 2019
Liu Yang, Rudrasis Chakraborty

Recent years have witnessed the emergence of 3D medical imaging techniques with the development of 3D sensors and technology. Due to the presence of noise in image acquisition, registration researchers focused on an alternative way to represent medical images. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a go-to'' choice, it is a `natural'' way to capture 3D shapes. However, as the number of samples for medical images are small, researchers have used pre-trained models to fine-tune on medical images. Furthermore, due to different modality in medical images, standard generative models can not be used to generate new samples of medical images. In this work, we use the advantage of point-cloud representation of 3D structures of medical images and propose a Gaussian mixture model-based generation scheme. Our proposed method is robust to outliers. Experimental validation has been performed to show that the proposed scheme can generate new 3D structures using interpolation techniques, i.e., given two 3D structures represented as point-clouds, we can generate point-clouds in between. We have also generated new point-clouds for subjects with and without dementia and show that the generated samples are indeed closely matched to the respective training samples from the same class.

##### An "augmentation-free" rotation invariant classification scheme on point-cloud and its application to neuroimaging

Nov 05, 2019
Liu Yang, Rudrasis Chakraborty

Recent years have witnessed the emergence and increasing popularity of 3D medical imaging techniques with the development of 3D sensors and technology. However, achieving geometric invariance in the processing of 3D medical images is computationally expensive but nonetheless essential due to the presence of possible errors caused by rigid registration techniques. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a "go-to" choice, it is a canonical way to preserve rotation invariance. Unfortunately, due to the presence of discrete topology, one can not use the standard convolution operator on point-cloud. To the best of our knowledge, the existing ways to do "convolution" can not preserve the rotation invariance without explicit data augmentation. Therefore, we propose a rotation invariant convolution operator by inducing topology from hypersphere. Experimental validation has been performed on publicly available OASIS dataset in terms of classification accuracy between subjects with (without) dementia, demonstrating the usefulness of our proposed method in terms of model complexity, classification accuracy, and last but most important invariance to rotations.

* arXiv admin note: text overlap with arXiv:1910.13050 and arXiv:1911.01705
##### Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates

Oct 08, 2019
Yang Liu, Hongyi Guo

Learning with noisy labels is a common problem in supervised learning. Existing approaches require practitioners to specify \emph{noise rates}, i.e., a set of parameters controlling the severity of label noises in the problem. In this work, we introduce a technique to learn from noisy labels that does not require a priori specification of the noise rates. In particular, we introduce a new family of loss functions that we name as \emph{peer loss} functions. Our approach then uses a standard empirical risk minimization (ERM) framework with peer loss functions. Peer loss functions associate each training sample with a certain form of "peer" samples, which evaluate a classifier' predictions jointly. We show that, under mild conditions, performing ERM with peer loss functions on the noisy dataset leads to the optimal or a near optimal classifier as if performing ERM over the clean training data, which we do not have access to. To our best knowledge, this is the first result on "learning with noisy labels without knowing noise rates" with theoretical guarantees. We pair our results with an extensive set of experiments, where we compare with state-of-the-art techniques of learning with noisy labels. Our results show that peer loss functions based method consistently outperforms the baseline benchmarks. Peer loss provides a way to simplify model development when facing potentially noisy training labels, and can be promoted as a robust candidate loss function in such situations.

##### Machine Truth Serum

Sep 28, 2019
Tianyi Luo, Yang Liu

Wisdom of the crowd revealed a striking fact that the majority answer from a crowd is often more accurate than any individual expert. We observed the same story in machine learning--ensemble methods leverage this idea to combine multiple learning algorithms to obtain better classification performance. Among many popular examples is the celebrated Random Forest, which applies the majority voting rule in aggregating different decision trees to make the final prediction. Nonetheless, these aggregation rules would fail when the majority is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum that "a surprisingly more popular answer is more likely the true answer" to classification problems. The challenge for us is to define or detect when an answer should be considered as being "surprising". We present two machine learning aided methods which aim to reveal the truth when it is minority instead of majority who has the true answer. Our experiments over real-world datasets show that better classification performance can be obtained compared to always trusting the majority voting. Our proposed methods also outperform popular ensemble algorithms. Our approach can be generically applied as a subroutine in ensemble methods to replace majority voting rule.