Research papers and code for "Warren He":
We address the relative paucity of empirical testing of learning algorithms (of any type) by introducing a new public-domain, Modular, Optimal Learning Testing Environment (MOLTE) for Bayesian ranking and selection problem, stochastic bandits or sequential experimental design problems. The Matlab-based simulator allows the comparison of a number of learning policies (represented as a series of .m modules) in the context of a wide range of problems (each represented in its own .m module) which makes it easy to add new algorithms and new test problems. State-of-the-art policies and various problem classes are provided in the package. The choice of problems and policies is guided through a spreadsheet-based interface. Different graphical metrics are included. MOLTE is designed to be compatible with parallel computing to scale up from local desktop to clusters and clouds. We offer MOLTE as an easy-to-use tool for the research community that will make it possible to perform much more comprehensive testing, spanning a broader selection of algorithms and test problems. We demonstrate the capabilities of MOLTE through a series of comparisons of policies on a starter library of test problems. We also address the problem of tuning and constructing priors that have been largely overlooked in optimal learning literature. We envision MOLTE as a modest spur to provide researchers an easy environment to study interesting questions involved in optimal learning.

Click to Read Paper and Get Code
Motivation: Entropy measurements on hierarchical structures have been used in methods for information retrieval and natural language modeling. Here we explore its application to semantic similarity. By finding shared ontology terms, semantic similarity can be established between annotated genes. A common procedure for establishing semantic similarity is to calculate the descriptiveness (information content) of ontology terms and use these values to determine the similarity of annotations. Most often information content is calculated for an ontology term by analyzing its frequency in an annotation corpus. The inherent problems in using these values to model functional similarity motivates our work. Summary: We present a novel calculation for establishing the entropy of a DAG-based ontology, which can be used in an alternative method for establishing the information content of its terms. We also compare our IC metric to two others using semantic and sequence similarity.

* in ISMB Bio-Ontologies, 2012
Click to Read Paper and Get Code
We consider the problem of estimating the expected value of information (the knowledge gradient) for Bayesian learning problems where the belief model is nonlinear in the parameters. Our goal is to maximize some metric, while simultaneously learning the unknown parameters of the nonlinear belief model, by guiding a sequential experimentation process which is expensive. We overcome the problem of computing the expected value of an experiment, which is computationally intractable, by using a sampled approximation, which helps to guide experiments but does not provide an accurate estimate of the unknown parameters. We then introduce a resampling process which allows the sampled model to adapt to new information, exploiting past experiments. We show theoretically that the method converges asymptotically to the true parameters, while simultaneously maximizing our metric. We show empirically that the process exhibits rapid convergence, yielding good results with a very small number of experiments.

Click to Read Paper and Get Code
A treatment regime is a function that maps individual patient information to a recommended treatment, hence explicitly incorporating the heterogeneity in need for treatment across individuals. Patient responses are dichotomous and can be predicted through an unknown relationship that depends on the patient information and the selected treatment. The goal is to find the treatments that lead to the best patient responses on average. Each experiment is expensive, forcing us to learn the most from each experiment. We adopt a Bayesian approach both to incorporate possible prior information and to update our treatment regime continuously as information accrues, with the potential to allow smaller yet more informative trials and for patients to receive better treatment. By formulating the problem as contextual bandits, we introduce a knowledge gradient policy to guide the treatment assignment by maximizing the expected value of information, for which an approximation method is used to overcome computational challenges. We provide a detailed study on how to make sequential medical decisions under uncertainty to reduce health care costs on a real world knee replacement dataset. We use clustering and LASSO to deal with the intrinsic sparsity in health datasets. We show experimentally that even though the problem is sparse, through careful selection of physicians (versus picking them at random), we can significantly improve the success rates.

Click to Read Paper and Get Code
We consider sequential decision problems in which we adaptively choose one of finitely many alternatives and observe a stochastic reward. We offer a new perspective of interpreting Bayesian ranking and selection problems as adaptive stochastic multi-set maximization problems and derive the first finite-time bound of the knowledge-gradient policy for adaptive submodular objective functions. In addition, we introduce the concept of prior-optimality and provide another insight into the performance of the knowledge gradient policy based on the submodular assumption on the value of information. We demonstrate submodularity for the two-alternative case and provide other conditions for more general problems, bringing out the issue and importance of submodularity in learning problems. Empirical experiments are conducted to further illustrate the finite time behavior of the knowledge gradient policy.

Click to Read Paper and Get Code
XNMR is a system designed to explore the results of combining the well-founded semantics system XSB with the stable-models evaluator SMODELS. Its main goal is to work as a tool for fast and interactive exploration of knowledge bases.

* 2 pages; no figures; NMR2000 Systems Description
Click to Read Paper and Get Code
Traditional reinforcement learning agents learn from experience, past or present, gained through interaction with their environment. Our approach synthesizes experience, without requiring an agent to interact with their environment, by asking the policy directly "Are there situations X, Y, and Z, such that in these situations you would select actions A, B, and C?" In this paper we present Introspection Learning, an algorithm that allows for the asking of these types of questions of neural network policies. Introspection Learning is reinforcement learning algorithm agnostic and the states returned may be used as an indicator of the health of the policy or to shape the policy in a myriad of ways. We demonstrate the usefulness of this algorithm both in the context of speeding up training and improving robustness with respect to safety constraints.

* 8 pages. Submitted to 2019 AAAI Spring Symposium on Verification of Neural Networks
Click to Read Paper and Get Code
Abductive reasoning generates explanatory hypotheses for new observations using prior knowledge. This paper investigates the use of forgetting, also known as uniform interpolation, to perform ABox abduction in description logic (ALC) ontologies. Non-abducibles are specified by a forgetting signature which can contain concept, but not role, symbols. The resulting hypotheses are semantically minimal and each consist of a set of disjuncts. These disjuncts are each independent explanations, and are not redundant with respect to the background ontology or the other disjuncts, representing a form of hypothesis space. The observations and hypotheses handled by the method can contain both atomic or complex ALC concepts, excluding role assertions, and are not restricted to Horn clauses. Two approaches to redundancy elimination are explored for practical use: full and approximate. Using a prototype implementation, experiments were performed over a corpus of real world ontologies to investigate the practicality of both approaches across several settings.

* Long version of a paper accepted for publication in the proceedings of AAAI 2019
Click to Read Paper and Get Code
In this paper, we introduce the syndrome loss, an alternative loss function for neural error-correcting decoders based on a relaxation of the syndrome. The syndrome loss penalizes the decoder for producing outputs that do not correspond to valid codewords. We show that training with the syndrome loss yields decoders with consistently lower frame error rate for a number of short block codes, at little additional cost during training and no additional cost during inference. The proposed method does not depend on knowledge of the transmitted codeword, making it a promising tool for online adaptation to changing channel conditions.

* Accepted to Asilomar 2018 - special session on "Machine Learning for Wireless Systems"
Click to Read Paper and Get Code
We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success in either offline (training) or online (testing) phases. Our problem is motivated by real-world applications where observations are time-consuming and/or expensive. We develop a knowledge gradient policy using an online Bayesian linear classifier to guide the experiment by maximizing the expected value of information of labeling each alternative. We provide a finite-time analysis of the estimated error and show that the maximum likelihood estimator based produced by the KG policy is consistent and asymptotically normal. We also show that the knowledge gradient policy is asymptotically optimal in an offline setting. This work further extends the knowledge gradient to the setting of contextual bandits. We report the results of a series of experiments that demonstrate its efficiency.

* arXiv admin note: text overlap with arXiv:1510.02354
Click to Read Paper and Get Code
Recently, it was shown that if multiplicative weights are assigned to the edges of a Tanner graph used in belief propagation decoding, it is possible to use deep learning techniques to find values for the weights which improve the error-correction performance of the decoder. Unfortunately, this approach requires many multiplications, which are generally expensive operations. In this paper, we suggest a more hardware-friendly approach in which offset min-sum decoding is augmented with learnable offset parameters. Our method uses no multiplications and has a parameter count less than half that of the multiplicative algorithm. This both speeds up training and provides a feasible path to hardware architectures. After describing our method, we compare the performance of the two neural decoding algorithms and show that our method achieves error-correction performance within 0.1 dB of the multiplicative approach and as much as 1 dB better than traditional belief propagation for the codes under consideration.

* Published as a conference paper at the 2017 International Symposium on Information Theory (ISIT)
Click to Read Paper and Get Code
This document describes the contributions of the 2016 Applications of Logic Programming Workshop (AppLP), which was held on October 17 and associated with the International Conference on Logic Programming (ICLP) in Flushing, New York City.

* David S. Warren and Yanhong A. Liu (Editors). 33 pages. Including summaries by Christopher Kane and abstracts or position papers by M. Aref, J. Rosenwald, I. Cervesato, E.S.L. Lam, M. Balduccini, J. Lobo, A. Russo, E. Lupu, N. Leone, F. Ricca, G. Gupta, K. Marple, E. Salazar, Z. Chen, A. Sobhi, S. Srirangapalli, C.R. Ramakrishnan, N. Bj{\o}rner, N.P. Lopes, A. Rybalchenko, and P. Tarau
Click to Read Paper and Get Code
Visual recognition and vision based retrieval of objects from large databases are tasks with a wide spectrum of potential applications. In this paper we propose a novel recognition method from video sequences suitable for retrieval from databases acquired in highly unconstrained conditions e.g. using a mobile consumer-level device such as a phone. On the lowest level, we represent each sequence as a 3D mesh of densely packed local appearance descriptors. While image plane geometry is captured implicitly by a large overlap of neighbouring regions from which the descriptors are extracted, 3D information is extracted by means of a descriptor transition table, learnt from a single sequence for each known gallery object. These allow us to connect local descriptors along the 3rd dimension (which corresponds to viewpoint changes), thus resulting in a set of variable length Markov chains for each video. The matching of two sets of such chains is formulated as a statistical hypothesis test, whereby a subset of each is chosen to maximize the likelihood that the corresponding video sequences show the same object. The effectiveness of the proposed algorithm is empirically evaluated on the Amsterdam Library of Object Images and a new highly challenging video data set acquired using a mobile phone. On both data sets our method is shown to be successful in recognition in the presence of background clutter and large viewpoint changes.

* 2016
Click to Read Paper and Get Code
We consider sequential decision making problems for binary classification scenario in which the learner takes an active role in repeatedly selecting samples from the action pool and receives the binary label of the selected alternatives. Our problem is motivated by applications where observations are time consuming and/or expensive, resulting in small samples. The goal is to identify the best alternative with the highest response. We use Bayesian logistic regression to predict the response of each alternative. By formulating the problem as a Markov decision process, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative and provide a finite-time analysis on the estimated error. Experiments on benchmark UCI datasets demonstrate the effectiveness of the proposed method.

Click to Read Paper and Get Code
Online approximation of an optimal station keeping strategy for a fully actuated six degrees-of-freedom autonomous underwater vehicle is considered. The developed controller is an approximation of the solution to a two player zero-sum game where the controller is the minimizing player and an external disturbance is the maximizing player. The solution is approximated using a reinforcement learning-based actor-critic framework. The result guarantees uniformly ultimately bounded (UUB) convergence of the states and UUB convergence of the approximated policies to the optimal polices without the requirement of persistence of excitation.

* 6 pages
Click to Read Paper and Get Code
We develop recursive, data-driven, stochastic subgradient methods for optimizing a new, versatile, and application-driven class of convex risk measures, termed here as mean-semideviations, strictly generalizing the well-known and popular mean-upper-semideviation. We introduce the MESSAGEp algorithm, which is an efficient compositional subgradient procedure for iteratively solving convex mean-semideviation risk-averse problems to optimality. We analyze the asymptotic behavior of the MESSAGEp algorithm under a flexible and structure-exploiting set of problem assumptions. In particular: 1) Under appropriate stepsize rules, we establish pathwise convergence of the MESSAGEp algorithm in a strong technical sense, confirming its asymptotic consistency. 2) Assuming a strongly convex cost, we show that, for fixed semideviation order $p>1$ and for $\epsilon\in\left[0,1\right)$, the MESSAGEp algorithm achieves a squared-${\cal L}_{2}$ solution suboptimality rate of the order of ${\cal O}(n^{-\left(1-\epsilon\right)/2})$ iterations, where, for $\epsilon>0$, pathwise convergence is simultaneously guaranteed. This result establishes a rate of order arbitrarily close to ${\cal O}(n^{-1/2})$, while ensuring strongly stable pathwise operation. For $p\equiv1$, the rate order improves to ${\cal O}(n^{-2/3})$, which also suffices for pathwise convergence, and matches previous results. 3) Likewise, in the general case of a convex cost, we show that, for any $\epsilon\in\left[0,1\right)$, the MESSAGEp algorithm with iterate smoothing achieves an ${\cal L}_{1}$ objective suboptimality rate of the order of ${\cal O}(n^{-\left(1-\epsilon\right)/\left(4\bf{1}_{\left\{ p>1\right\} }+4\right)})$ iterations. This result provides maximal rates of ${\cal O}(n^{-1/4})$, if $p\equiv1$, and ${\cal O}(n^{-1/8})$, if $p>1$, matching the state of the art, as well.

* 90 pages, 3 figures. Update: Substantial revision of the technical content, with an additional fully detailed analysis in regard to the rate of convergence of the MESSAGEp algorithm. NOTE: Please open in browser to see the math in the abstract!
Click to Read Paper and Get Code
Consider a sample of $n$ points taken i.i.d from a submanifold $\Sigma$ of Euclidean space. We show that there is a way to estimate the Ricci curvature of $\Sigma$ with respect to the induced metric from the sample. Our method is grounded in the notions of Carr\'e du Champ for diffusion semi-groups, the theory of Empirical processes and local Principal Component Analysis.

* 47 pages
Click to Read Paper and Get Code
Understanding driving behaviors is essential for improving safety and mobility of our transportation systems. Data is usually collected via simulator-based studies or naturalistic driving studies. Those techniques allow for understanding relations between demographics, road conditions and safety. On the other hand, they are very costly and time consuming. Thanks to the ubiquity of smartphones, we have an opportunity to substantially complement more traditional data collection techniques with data extracted from phone sensors, such as GPS, accelerometer gyroscope and camera. We developed statistical models that provided insight into driver behavior in the San Francisco metro area based on tens of thousands of driver logs. We used novel data sources to support our work. We used cell phone sensor data drawn from five hundred drivers in San Francisco to understand the speed of traffic across the city as well as the maneuvers of drivers in different areas. Specifically, we clustered drivers based on their driving behavior. We looked at driver norms by street and flagged driving behaviors that deviated from the norm.

Click to Read Paper and Get Code
In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose data-driven and simulation-based approximate dynamic programming (ADP) algorithms to solve the risk-averse sequential decision problem. We address the issue of inefficient sampling for risk applications in simulated settings and present a procedure, based on importance sampling, to direct samples toward the "risky region" as the ADP algorithm progresses. Finally, we show numerical results of our algorithms in the context of an application involving risk-averse bidding for energy storage.

* 39 pages, 7 figures
Click to Read Paper and Get Code
Wheeled planetary rovers such as the Mars Exploration Rovers (MERs) and Mars Science Laboratory (MSL) have provided unprecedented, detailed images of the Mars surface. However, these rovers are large and are of high-cost as they need to carry sophisticated instruments and science laboratories. We propose the development of low-cost planetary rovers that are the size and shape of cantaloupes and that can be deployed from a larger rover. The rover named SphereX is 2 kg in mass, is spherical, holonomic and contains a hopping mechanism to jump over rugged terrain. A small low-cost rover complements a larger rover, particularly to traverse rugged terrain or roll down a canyon, cliff or crater to obtain images and science data. While it may be a one-way journey for these small robots, they could be used tactically to obtain high-reward science data. The robot is equipped with a pair of stereo cameras to perform visual navigation and has room for a science payload. In this paper, we analyze the design and development of a laboratory prototype. The results show a promising pathway towards development of a field system.

* 10 pages, 16 figures in Proceedings of the IEEE Aerospace Conference 2017
Click to Read Paper and Get Code