Research papers and code for "Lihong Wang":
Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts (bi)linear models to represent the high-dimensional value function and state-action distributions, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Click to Read Paper and Get Code
Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.

* To appear in AAAI 2018
Click to Read Paper and Get Code
We introduce a new paradigm of learning for reasoning, understanding, and prediction, as well as the scaffolding network to implement this paradigm. The scaffolding network embodies an incremental learning approach that is formulated as a teacher-student network architecture to teach machines how to understand text and do reasoning. The key to our computational scaffolding approach is the interactions between the teacher and the student through sequential questioning. The student observes each sentence in the text incrementally, and it uses an attention-based neural net to discover and register the key information in relation to its current memory. Meanwhile, the teacher asks questions about the observed text, and the student network gets rewarded by correctly answering these questions. The entire network is updated continually using reinforcement learning. Our experimental results on synthetic and real datasets show that the scaffolding network not only outperforms state-of-the-art methods but also learns to do reasoning in a scalable way even with little human generated input.

* 11 pages + Abstract + 3 figures
Click to Read Paper and Get Code
Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a \emph{replay} methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

* 10 pages, 7 figures, revised from the published version at the WSDM 2011 conference
Click to Read Paper and Get Code
Knowledge Base (KB) completion, which aims to determine missing relation between entities, has raised increasing attention in recent years. Most existing methods either focus on the positional relationship between entity pair and single relation (1-hop path) in semantic space or concentrate on the joint probability of Random Walks on multi-hop paths among entities. However, they do not fully consider the intrinsic relationships of all the links among entities. By observing that the single relation and multi-hop paths between the same entity pair generally contain shared/similar semantic information, this paper proposes a novel method to capture the shared features between them as the basis for inferring missing relations. To capture the shared features jointly, we develop Hierarchical Attention Networks (HANs) to automatically encode the inputs into low-dimensional vectors, and exploit two partial parameter-shared components, one for feature source discrimination and the other for determining missing relations. By joint Adversarial Training (AT) the entire model, our method minimizes the classification error of missing relations, and ensures the source of shared features are difficult to discriminate in the meantime. The AT mechanism encourages our model to extract features that are both discriminative for missing relation prediction and shareable between single relation and multi-hop paths. We extensively evaluate our method on several large-scale KBs for relation completion. Experimental results show that our method consistently outperforms the baseline approaches. In addition, the hierarchical attention mechanism and the feature extractor in our model can be well interpreted and utilized in the related downstream tasks.

Click to Read Paper and Get Code
Many real-world problems can be represented as graph-based learning problems. In this paper, we propose a novel framework for learning spatial and attentional convolution neural networks on arbitrary graphs. Different from previous convolutional neural networks on graphs, we first design a motif-matching guided subgraph normalization method to capture neighborhood information. Then we implement self-attentional layers to learn different importances from different subgraphs to solve graph classification problems. Analogous to image-based attentional convolution networks that operate on locally connected and weighted regions of the input, we also extend graph normalization from one-dimensional node sequence to two-dimensional node grid by leveraging motif-matching, and design self-attentional layers without requiring any kinds of cost depending on prior knowledge of the graph structure. Our results on both bioinformatics and social network datasets show that we can significantly improve graph classification benchmarks over traditional graph kernel and existing deep models.

Click to Read Paper and Get Code
Understanding the diffusion in social network is an important task. However, this task is challenging since (1) the network structure is usually hidden with only observations of events like "post" or "repost" associated with each node, and (2) the interactions between nodes encompass multiple distinct patterns which in turn affect the diffusion patterns. For instance, social interactions seldom develop on a single channel, and multiple relationships can bind pairs of people due to their various common interests. Most previous work considers only one of these two challenges which is apparently unrealistic. In this paper, we study the problem of \emph{inferring multiplex network} in social networks. We propose the Multiplex Diffusion Model (MDM) which incorporates the multivariate marked Hawkes process and topic model to infer the multiplex structure of social network. A MCMC based algorithm is developed to infer the latent multiplex structure and to estimate the node-related parameters. We evaluate our model based on both synthetic and real-world datasets. The results show that our model is more effective in terms of uncovering the multiplex network structure.

Click to Read Paper and Get Code
Developing agents to engage in complex goal-oriented dialogues is challenging partly because the main learning signals are very sparse in long conversations. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given successful example dialogues, we propose the Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a multi-level policy by hierarchical reinforcement learning. We demonstrate our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals. Moreover, we show that the learned subgoals are often human comprehensible.

* 11 pages, 6 figures, EMNLP 2018
Click to Read Paper and Get Code
We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning. NLMs exploit the power of both neural networks---as function approximators, and logic programming---as a symbolic processor for objects with properties, relations, logic connectives, and quantifiers. After being trained on small-scale tasks (such as sorting short arrays), NLMs can recover lifted rules, and generalize to large-scale tasks (such as sorting longer arrays). In our experiments, NLMs achieve perfect generalization in a number of tasks, from relational reasoning tasks on the family tree and general graphs, to decision making tasks including sorting arrays, finding shortest paths, and playing the blocks world. Most of these tasks are hard to accomplish for neural networks or inductive logic programming alone.

* ICLR 2019. Project page: https://sites.google.com/view/neural-logic-machines
Click to Read Paper and Get Code
Network representation learning, as an approach to learn low dimensional representations of vertices, has attracted considerable research attention recently. It has been proven extremely useful in many machine learning tasks over large graph. Most existing methods focus on learning the structural representations of vertices in a static network, but cannot guarantee an accurate and efficient embedding in a dynamic network scenario. To address this issue, we present an efficient incremental skip-gram algorithm with negative sampling for dynamic network embedding, and provide a set of theoretical analyses to characterize the performance guarantee. Specifically, we first partition a dynamic network into the updated, including addition/deletion of links and vertices, and the retained networks over time. Then we factorize the objective function of network embedding into the added, vanished and retained parts of the network. Next we provide a new stochastic gradient-based method, guided by the partitions of the network, to update the nodes and the parameter vectors. The proposed algorithm is proven to yield an objective function value with a bounded difference to that of the original objective function. Experimental results show that our proposal can significantly reduce the training time while preserving the comparable performance. We also demonstrate the correctness of the theoretical analysis and the practical usefulness of the dynamic network embedding. We perform extensive experiments on multiple real-world large network datasets over multi-label classification and link prediction tasks to evaluate the effectiveness and efficiency of the proposed framework, and up to 22 times speedup has been achieved.

* Accepted by China Science Information Science. arXiv admin note: text overlap with arXiv:1811.05932 by other authors
Click to Read Paper and Get Code
CNNs, RNNs, GCNs, and CapsNets have shown significant insights in representation learning and are widely used in various text mining tasks such as large-scale multi-label text classification. However, most existing deep models for multi-label text classification consider either the non-consecutive and long-distance semantics or the sequential semantics, but how to consider them both coherently is less studied. In addition, most existing methods treat output labels as independent methods, but ignore the hierarchical relations among them, leading to useful semantic information loss. In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. Specifically, we first propose to model each document as a word order preserved graph-of-words and normalize it as a corresponding words-matrix representation which preserves both the non-consecutive, long-distance and local sequential semantics. Then the words-matrix is input to the proposed attentional graph capsule recurrent CNNs for more effectively learning the semantic features. To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating the label representation similarity. Extensive evaluations on three datasets show that our model significantly improves the performance of large-scale multi-label text classification by comparing with state-of-the-art approaches.

Click to Read Paper and Get Code
We study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two popular bandit algorithms: $\epsilon$-greedy and UCB, \emph{without} knowledge of the mean rewards. The attacker is able to spend only logarithmic effort, multiplied by a problem-specific parameter that becomes smaller as the bandit problem gets easier to attack. The result means the attacker can easily hijack the behavior of the bandit algorithm to promote or obstruct certain actions, say, a particular medical treatment. As bandits are seeing increasingly wide use in practice, our study exposes a significant security threat.

* accepted to NIPS
Click to Read Paper and Get Code
We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

* GameSec 2018
Click to Read Paper and Get Code