Research papers and code for "Yang Liu":
The intelligent Processing technique is more and more attractive to researchers due to its ability to deal with key problems in Vehicular Ad hoc networks. However, several problems in applying intelligent processing technologies in VANETs remain open. The existing applications are comprehensively reviewed and discussed, and classified into different categories in this paper. Their strategies, advantages/disadvantages, and performances are elaborated. By generalizing different tactics in various applications related to different scenarios of VANETs and evaluating their performances, several promising directions for future research have been suggested.

* 11pages, 5 figures
Click to Read Paper and Get Code
BERT, a pre-trained Transformer model, has achieved ground-breaking performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple variant of BERT, for extractive summarization. Our system is the state of the art on the CNN/Dailymail dataset, outperforming the previous best-performed system by 1.65 on ROUGE-L. The codes to reproduce our results are available at https://github.com/nlpyang/BertSum

Click to Read Paper and Get Code
To address modeling problems of brain-inspired intelligence, this thesis is focused on researching in the semantic-oriented framework design for multimedia and multimodal information. The Multimedia Neural Cognitive Computing (MNCC) model was designed based on the nervous mechanism and cognitive architecture. Furthermore, the semantic-oriented hierarchical Cross-modal Neural Cognitive Computing (CNCC) framework was proposed based on MNCC model, and formal description and analysis for CNCC framework was given. It would effectively improve the performance of semantic processing for multimedia and cross-modal information, and has far-reaching significance for exploration and realization brain-inspired computing.

Click to Read Paper and Get Code
Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.

Click to Read Paper and Get Code
Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations. A good teacher will select appropriate teaching materials, impact suitable methodologies, and set up targeted examinations, according to the learning behaviors of the students. In the field of artificial intelligence, however, one has not fully explored the role of teaching, and pays most attention to machine \emph{learning}. In this paper, we argue that equal attention, if not more, should be paid to teaching, and furthermore, an optimization framework (instead of heuristics) should be used to obtain good teaching strategies. We call this approach `learning to teach'. In the approach, two intelligent agents interact with each other: a student model (which corresponds to the learner in traditional machine learning algorithms), and a teacher model (which determines the appropriate data, loss function, and hypothesis space to facilitate the training of the student model). The teacher model leverages the feedback from the student model to optimize its own teaching strategies by means of reinforcement learning, so as to achieve teacher-student co-evolution. To demonstrate the practical value of our proposed approach, we take the training of deep neural networks (DNN) as an example, and show that by using the learning to teach techniques, we are able to use much less training data and fewer iterations to achieve almost the same accuracy for different kinds of DNN models (e.g., multi-layer perceptron, convolutional neural networks and recurrent neural networks) under various machine learning tasks (e.g., image classification and text understanding).

* ICLR 2018
Click to Read Paper and Get Code
With the rapid development of Deep Neural Networks (DNNs), various network models that show strong computing power and impressive expressive power are proposed. However, there is no comprehensive informational interpretation of DNNs from the perspective of information theory. Due to the nonlinear function and the uncertain number of layers and neural units used in the DNNs, the network structure shows nonlinearity and complexity. With the typical DNNs named Convolutional Arithmetic Circuits (ConvACs), the complex DNNs can be converted into mathematical formula. Thus, we can use rigorous mathematical theory especially the information theory to analyse the complicated DNNs. In this paper, we propose a novel information scaling law scheme that can interpret the network's inner organization by information theory. First, we show the informational interpretation of the activation function. Secondly, we prove that the information entropy increases when the information is transmitted through the ConvACs. Finally, we propose the information scaling law of ConvACs through making a reasonable assumption.

* 7 pages, 5 figures
Click to Read Paper and Get Code
Identifying and correcting grammatical errors in the text written by non-native writers has received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVMs and convolutional networks with fixed-size context window.

Click to Read Paper and Get Code
Making judicious channel access and transmission scheduling decisions is essential for improving performance as well as energy and spectral efficiency in multichannel wireless systems. This problem has been a subject of extensive study in the past decade, and the resulting dynamic and opportunistic channel access schemes can bring potentially significant improvement over traditional schemes. However, a common and severe limitation of these dynamic schemes is that they almost always require some form of a priori knowledge of the channel statistics. A natural remedy is a learning framework, which has also been extensively studied in the same context, but a typical learning algorithm in this literature seeks only the best static policy, with performance measured by weak regret, rather than learning a good dynamic channel access policy. There is thus a clear disconnect between what an optimal channel access policy can achieve with known channel statistics that actively exploits temporal, spatial and spectral diversity, and what a typical existing learning algorithm aims for, which is the static use of a single channel devoid of diversity gain. In this paper we bridge this gap by designing learning algorithms that track known optimal or sub-optimal dynamic channel access and transmission scheduling policies, thereby yielding performance measured by a form of strong regret, the accumulated difference between the reward returned by an optimal solution when a priori information is available and that by our online algorithm. We do so in the context of two specific algorithms that appeared in [1] and [2], respectively, the former for a multiuser single-channel setting and the latter for a single-user multichannel setting. In both cases we show that our algorithms achieve sub-linear regret uniform in time and outperforms the standard weak-regret learning algorithms.

* 10 pages, to appear in MobiHoc 2015
Click to Read Paper and Get Code
We analyze the following group learning problem in the context of opinion diffusion: Consider a network with $M$ users, each facing $N$ options. In a discrete time setting, at each time step, each user chooses $K$ out of the $N$ options, and receive randomly generated rewards, whose statistics depend on the options chosen as well as the user itself, and are unknown to the users. Each user aims to maximize their expected total rewards over a certain time horizon through an online learning process, i.e., a sequence of exploration (sampling the return of each option) and exploitation (selecting empirically good options) steps. Within this context we consider two group learning scenarios, (1) users with uniform preferences and (2) users with diverse preferences, and examine how a user should construct its learning process to best extract information from other's decisions and experiences so as to maximize its own reward. Performance is measured in {\em weak regret}, the difference between the user's total reward and the reward from a user-specific best single-action policy (i.e., always selecting the set of options generating the highest mean rewards for this user). Within each scenario we also consider two cases: (i) when users exchange full information, meaning they share the actual rewards they obtained from their choices, and (ii) when users exchange limited information, e.g., only their choices but not rewards obtained from these choices.

Click to Read Paper and Get Code
Wagering mechanisms are one-shot betting mechanisms that elicit agents' predictions of an event. For deterministic wagering mechanisms, an existing impossibility result has shown incompatibility of some desirable theoretical properties. In particular, Pareto optimality (no profitable side bet before allocation) can not be achieved together with weak incentive compatibility, weak budget balance and individual rationality. In this paper, we expand the design space of wagering mechanisms to allow randomization and ask whether there are randomized wagering mechanisms that can achieve all previously considered desirable properties, including Pareto optimality. We answer this question positively with two classes of randomized wagering mechanisms: i) one simple randomized lottery-type implementation of existing deterministic wagering mechanisms, and ii) another family of simple and randomized wagering mechanisms which we call surrogate wagering mechanisms, which are robust to noisy ground truth. This family of mechanisms builds on the idea of learning with noisy labels (Natarajan et al. 2013) as well as a recent extension of this idea to the information elicitation without verification setting (Liu and Chen 2018). We show that a broad family of randomized wagering mechanisms satisfy all desirable theoretical properties.

Click to Read Paper and Get Code
In this paper, we develop a neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner. We represent cross-document relationships via an attention mechanism which allows to share information as opposed to simply concatenating text spans and processing them as a flat sequence. Our model learns latent dependencies among textual units, but can also take advantage of explicit graph representations focusing on similarity or discourse relations. Empirical results on the WikiSum dataset demonstrate that the proposed architecture brings substantial improvements over several strong baselines.

* to appear at ACL 2019
Click to Read Paper and Get Code
In this paper, we address the question answering challenge with the SQuAD 2.0 dataset. We design a model architecture which leverages BERT's capability of context-aware word embeddings and BiDAF's context interactive exploration mechanism. By integrating these two state-of-the-art architectures, our system tries to extract the contextual word representation at word and character levels, for better comprehension of both question and context and their correlations. We also propose our original joint posterior probability predictor module and its associated loss functions. Our best model so far obtains F1 score of 75.842% and EM score of 72.24% on the test PCE leaderboad.

Click to Read Paper and Get Code
Contactless and online palmprint identfication offers improved user convenience, hygiene, user-security and is highly desirable in a range of applications. This technical report details an accurate and generalizable deep learning-based framework to detect and recognize humans using contactless palmprint images in the wild. Our network is based on fully convolutional network that generates deeply learned residual features. We design a soft-shifted triplet loss function to more effectively learn discriminative palmprint features. Online palmprint identification also requires a contactless palm detector, which is adapted and trained from faster-R-CNN architecture, to detect palmprint region under varying backgrounds. Our reproducible experimental results on publicly available contactless palmprint databases suggest that the proposed framework consistently outperforms several classical and state-of-the-art palmprint recognition methods. More importantly, the model presented in this report offers superior generalization capability, unlike other popular methods in the literature, as it does not essentially require database-specific parameter tuning, which is another key advantage over other methods in the literature.

Click to Read Paper and Get Code
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings independent from the corpus, while such information plays an important role in expressing the exact meanings of words for parataxis languages like Chinese. In this paper, after constructing the Chinese lexical and semantic ontology based on word-formation, we propose a novel approach to implanting the structured rational knowledge into distributed representation at morpheme level, naturally avoiding heavy disambiguation in the corpus. We design a template to create the instances as pseudo-sentences merely from the pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical information and tackle the data sparseness problem, the instance proliferation technique is applied based on similarity to expand the collection of pseudo-sentences. The distributed representation for morphemes can then be trained on these pseudo-sentences using word2vec. For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge.

* AAAI 2019
Click to Read Paper and Get Code
Recommendation system is able to shape user demands, which can be used for boosting caching gain. In this paper, we jointly optimize content caching and recommendation at base stations to maximize the caching gain meanwhile not compromising the user preference. We first propose a model to capture the impact of recommendation on user demands, which is controlled by a user-specific psychological threshold. We then formulate a joint caching and recommendation problem maximizing the successful offloading probability, which is a mixed integer programming problem. We develop a hierarchical iterative algorithm to solve the problem when the threshold is known. Since the user threshold is unknown in practice, we proceed to propose an $\varepsilon$-greedy algorithm to find the solution by learning the threshold via interactions with users. Simulation results show that the proposed algorithms improve the successful offloading probability compared with prior works with/without recommendation. The $\varepsilon$-greedy algorithm learns the user threshold quickly, and achieves more than $1-\varepsilon$ of the performance obtained by the algorithm with known threshold.

* Accepted by IEEE GLOBECOM 2018
Click to Read Paper and Get Code
Bubble segmentation and size detection algorithms have been developed in recent years for their high efficiency and accuracy in measuring bubbly two-phase flows. In this work, we proposed an architecture called bubble generative adversarial networks (BubGAN) for the generation of realistic synthetic images which could be further used as training or benchmarking data for the development of advanced image processing algorithms. The BubGAN is trained initially on a labeled bubble dataset consisting of ten thousand images. By learning the distribution of these bubbles, the BubGAN can generate more realistic bubbles compared to the conventional models used in the literature. The trained BubGAN is conditioned on bubble feature parameters and has full control of bubble properties in terms of aspect ratio, rotation angle, circularity and edge ratio. A million bubble dataset is pre-generated using the trained BubGAN. One can then assemble realistic bubbly flow images using this dataset and associated image processing tool. These images contain detailed bubble information, therefore do not require additional manual labeling. This is more useful compared with the conventional GAN which generates images without labeling information. The tool could be used to provide benchmarking and training data for existing image processing algorithms and to guide the future development of bubble detecting algorithms.

* 20 pages, 15 figures
Click to Read Paper and Get Code
This work explores the query complexity of property testing for general piecewise functions on the real line, in the active and passive property testing settings. The results are proven under an abstract zero-measure crossings condition, which has as special cases piecewise constant functions and piecewise polynomial functions. We find that, in the active testing setting, the query complexity of testing general piecewise functions is independent of the number of pieces. We also identify the optimal dependence on the number of pieces in the query complexity of passive testing in the special case of piecewise constant functions.

Click to Read Paper and Get Code
We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification) for each one, and can then observe the loss incurred by this prediction. We go beyond traditional analyses, which have focused on stationary mixing processes or nonstationary product processes, by combining these two relaxations to allow nonstationary mixing processes. We are particularly interested in the case of $\beta$-mixing processes, with the sum of changes in marginal distributions growing sublinearly in the number of samples. Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate.

Click to Read Paper and Get Code
It is an exciting task to recover the scene's 3d-structure and camera pose from the video sequence. Most of the current solutions divide it into two parts, monocular depth recovery and camera pose estimation. The monocular depth recovery is often studied as an independent part, and a better depth estimation is used to solve the pose. While camera pose is still estimated by traditional SLAM (Simultaneous Localization And Mapping) methods in most cases. The use of unsupervised method for monocular depth recovery and pose estimation has benefited from the study of [1] and achieved good results. In this paper, we improve the method of [1]. Our emphasis is laid on the improvement of the idea and related theory, introducing a more reasonable inter frame constraints and finally synthesize the camera trajectory with inter frame pose estimation in the unified world coordinate system. And our results get better performance.

* 6 pages,5 figures,1 table
Click to Read Paper and Get Code
We study information elicitation without verification (IEWV) and ask the following question: Can we achieve truthfulness in dominant strategy in IEWV? This paper considers two elicitation settings. The first setting is when the mechanism designer has access to a random variable that is a noisy or proxy version of the ground truth, with known biases. The second setting is the standard peer prediction setting where agents' reports are the only source of information that the mechanism designer has. We introduce surrogate scoring rules (SSR) for the first setting, which use the noisy ground truth to evaluate quality of elicited information, and show that SSR achieve truthful elicitation in dominant strategy. Built upon SSR, we develop a multi-task mechanism, dominant truth serum (DTS), to achieve truthful elicitation in dominant strategy when the mechanism designer only has access to agents' reports (the second setting). The method relies on an estimation procedure to accurately estimate the average bias in the reports of other agents. With the accurate estimation, a random peer agent's report serves as a noisy ground truth and SSR can then be applied to achieve truthfulness in dominant strategy. A salient feature of SSR and DTS is that they both quantify the quality or value of information despite lack of ground truth, just as proper scoring rules do for the with verification setting. Our work complements both the strictly proper scoring rule literature by solving the case where the mechanism designer only has access to a noisy or proxy version of the ground truth, and the peer prediction literature by achieving truthful elicitation in dominant strategy.

Click to Read Paper and Get Code