Models, code, and papers for "Nicholas C":

##### Social Cohesion in Autonomous Driving

Aug 27, 2018
Nicholas C. Landolfi, Anca D. Dragan

Autonomous cars can perform poorly for many reasons. They may have perception issues, incorrect dynamics models, be unaware of obscure rules of human traffic systems, or follow certain rules too conservatively. Regardless of the exact failure mode of the car, often human drivers around the car are behaving correctly. For example, even if the car does not know that it should pull over when an ambulance races by, other humans on the road will know and will pull over. We propose to make socially cohesive cars that leverage the behavior of nearby human drivers to act in ways that are safer and more socially acceptable. The simple intuition behind our algorithm is that if all the humans are consistently behaving in a particular way, then the autonomous car probably should too. We analyze the performance of our algorithm in a variety of scenarios and conduct a user study to assess people's attitudes towards socially cohesive cars. We find that people are surprisingly tolerant of mistakes that cohesive cars might make in order to get the benefits of driving in a car with a safer, or even just more socially acceptable behavior.

##### A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

Jul 15, 2019
Nicholas C. Landolfi, Garrett Thomas, Tengyu Ma

The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform sample-efficient adaptation to new tasks at test time. We use significantly fewer samples by performing policy optimization only in a "virtual" environment whose transitions are given by our learned dynamical model. Our algorithm sequentially trains against several tasks. Upon encountering a new task, we first warm-up a policy on our learned dynamical model, which requires no new samples from the environment. We then adapt the dynamical model with samples from this policy in the real environment. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these tasks.

* 13 pages, 3 figures
##### Exploring Conditioning for Generative Music Systems with Human-Interpretable Controls

Aug 04, 2019
Nicholas Meade, Nicholas Barreyre, Scott C. Lowe, Sageev Oore

Performance RNN is a machine-learning system designed primarily for the generation of solo piano performances using an event-based (rather than audio) representation. More specifically, Performance RNN is a long short-term memory (LSTM) based recurrent neural network that models polyphonic music with expressive timing and dynamics (Oore et al., 2018). The neural network uses a simple language model based on the Musical Instrument Digital Interface (MIDI) file format. Performance RNN is trained on the e-Piano Junior Competition Dataset (International Piano e-Competition, 2018), a collection of solo piano performances by expert pianists. As an artistic tool, one of the limitations of the original model has been the lack of useable controls. The standard form of Performance RNN can generate interesting pieces, but little control is provided over what specifically is generated. This paper explores a set of conditioning-based controls used to influence the generation process.

##### Variational Boosting: Iteratively Refining Posterior Approximations

Feb 19, 2017
Andrew C. Miller, Nicholas Foti, Ryan P. Adams

We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency.

* 25 pages, 9 figures, 2 tables
##### Filtered Fictitious Play for Perturbed Observation Potential Games and Decentralised POMDPs

Feb 14, 2012
Archie C. Chapman, Simon A. Williamson, Nicholas R. Jennings

Potential games and decentralised partially observable MDPs (Dec-POMDPs) are two commonly used models of multi-agent interaction, for static optimisation and sequential decisionmaking settings, respectively. In this paper we introduce filtered fictitious play for solving repeated potential games in which each player's observations of others' actions are perturbed by random noise, and use this algorithm to construct an online learning method for solving Dec-POMDPs. Specifically, we prove that noise in observations prevents standard fictitious play from converging to Nash equilibrium in potential games, which also makes fictitious play impractical for solving Dec-POMDPs. To combat this, we derive filtered fictitious play, and provide conditions under which it converges to a Nash equilibrium in potential games with noisy observations. We then use filtered fictitious play to construct a solver for Dec-POMDPs, and demonstrate our new algorithm's performance in a box pushing problem. Our results show that we consistently outperform the state-of-the-art Dec-POMDP solver by an average of 100% across the range of noise in the observation function.

##### Learning Reward Functions by Integrating Human Demonstrations and Preferences

Jun 21, 2019
Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.

##### An Unethical Optimization Principle

If an artificial intelligence aims to maximise risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion ${\eta}$ of available unethical strategies is small, the probability ${p_U}$ of picking an unethical strategy can become large; indeed unless returns are fat-tailed ${p_U}$ tends to unity as the strategy space becomes large. We define an Unethical Odds Ratio Upsilon (${\Upsilon}$) that allows us to calculate ${p_U}$ from ${\eta}$, and we derive a simple formula for the limit of ${\Upsilon}$ as the strategy space becomes large. We give an algorithm for estimating ${\Upsilon}$ and ${p_U}$ in finite cases and discuss how to deal with infinite strategy spaces. We show how this principle can be used to help detect unethical strategies and to estimate ${\eta}$. Finally we sketch some policy implications of this work.

##### Atari-fying the Vehicle Routing Problem with Stochastic Service Requests

We present a new general approach to modeling research problems as Atari-like videogames to make them amenable to recent groundbreaking solution methods from the deep reinforcement learning community. The approach is flexible, applicable to a wide range of problems. We demonstrate its application on a well known vehicle routing problem. Our preliminary results on this problem, though not transformative, show signs of success and suggest that Atari-fication may be a useful modeling approach for researchers studying problems involving sequential decision making under uncertainty.

* 11 pages, 4 figures

Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x).

##### Streaming Variational Bayes

We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.

* 25 pages, 3 figures, 1 table
##### Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Robots can learn the right reward function by querying a human expert. Existing approaches attempt to choose questions where the robot is most uncertain about the human's response; however, they do not consider how easy it will be for the human to answer! In this paper we explore an information gain formulation for optimally selecting questions that naturally account for the human's ability to answer. Our approach identifies questions that optimize the trade-off between robot and human uncertainty, and determines when these questions become redundant or costly. Simulations and a user study show our method not only produces easy questions, but also ultimately results in faster reward learning.

* Proceedings of the 3rd Conference on Robot Learning (CoRL), October 2019
##### The Cultural Evolution of National Constitutions

We explore how ideas from infectious disease and genetics can be used to uncover patterns of cultural inheritance and innovation in a corpus of 591 national constitutions spanning 1789 - 2008. Legal "Ideas" are encoded as "topics" - words statistically linked in documents - derived from topic modeling the corpus of constitutions. Using these topics we derive a diffusion network for borrowing from ancestral constitutions back to the US Constitution of 1789 and reveal that constitutions are complex cultural recombinants. We find systematic variation in patterns of borrowing from ancestral texts and "biological"-like behavior in patterns of inheritance with the distribution of "offspring" arising through a bounded preferential-attachment process. This process leads to a small number of highly innovative (influential) constitutions some of which have yet to have been identified as so in the current literature. Our findings thus shed new light on the critical nodes of the constitution-making network. The constitutional network structure reflects periods of intense constitution creation, and systematic patterns of variation in constitutional life-span and temporal influence.

* 38 pages with supplemental information, 13 figures, 2 tables; Accepted for publication the Journal of the Association for Information Science and Technology
##### Automated Planning in Repeated Adversarial Games

Game theory's prescriptive power typically relies on full rationality and/or self-play interactions. In contrast, this work sets aside these fundamental premises and focuses instead on heterogeneous autonomous interactions between two or more agents. Specifically, we introduce a new and concise representation for repeated adversarial (constant-sum) games that highlight the necessary features that enable an automated planing agent to reason about how to score above the game's Nash equilibrium, when facing heterogeneous adversaries. To this end, we present TeamUP, a model-based RL algorithm designed for learning and planning such an abstraction. In essence, it is somewhat similar to R-max with a cleverly engineered reward shaping that treats exploration as an adversarial optimization problem. In practice, it attempts to find an ally with which to tacitly collude (in more than two-player games) and then collaborates on a joint plan of actions that can consistently score a high utility in adversarial repeated games. We use the inaugural Lemonade Stand Game Tournament to demonstrate the effectiveness of our approach, and find that TeamUP is the best performing agent, demoting the Tournament's actual winning strategy into second place. In our experimental analysis, we show hat our strategy successfully and consistently builds collaborations with many different heterogeneous (and sometimes very sophisticated) adversaries.

* Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)
##### Robot Vision: Calibration of Wide-Angle Lens Cameras Using Collinearity Condition and K-Nearest Neighbour Regression

Visual perception is regularly used by humans and robots for navigation. By either implicitly or explicitly mapping the environment, ego-motion can be determined and a path of actions can be planned. The process of mapping and navigation are delicately intertwined; therefore, improving one can often lead to an improvement of the other. Both processes are sensitive to the interior orientation parameters of the camera system and mathematically modelling these systematic errors can often improve the precision and accuracy of the overall solution. This paper presents an automatic camera calibration method suitable for any lens, without having prior knowledge about the sensor. Statistical inference is performed to map the environment and localize the camera simultaneously. K-nearest neighbour regression is used to model the geometric distortions of the images. A normal-angle lens Nikon camera and wide-angle lens GoPro camera were calibrated using the proposed method, as well as the conventional bundle adjustment with self-calibration method (for comparison). Results showed that the mapping error was reduced from an average of 14.9 mm to 1.2 mm (i.e. a 92% improvement) and 66.6 mm to 1.5 mm (i.e. a 98% improvement) using the proposed method for the Nikon and GoPro cameras, respectively. In contrast, the conventional approach achieved an average 3D error of 0.9 mm (i.e. 94% improvement) and 3.3 mm (i.e. 95% improvement) for the Nikon and GoPro cameras, respectively. Thus, the proposed method performs well irrespective of the lens/sensor used: it yields results that are comparable to the conventional approach for normal-angle lens cameras, and it has the additional benefit of improving calibration results for wide-angle lens cameras.

* The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1, 2018, pp. 93-99
* ISPRS TC I Mid-term Symposium "Innovative Sensing - From Sensors to Methods and Applications", 10-12 October 2018. Karlsruhe, Germany
##### High-Fidelity State-of-Charge Estimation of Li-Ion Batteries Using Machine Learning

This paper proposes a way to augment the existing machine learning algorithm applied to state-of-charge estimation by introducing a form of pulse injection to the running battery cells. It is believed that the information contained in the pulse responses can be interpreted by a machine learning algorithm whereas other techniques are difficult to decode due to the nonlinearity. The sensitivity analysis of the amplitude of the current pulse is given through simulation, allowing the researchers to select the appropriate current level with respect to the desired accuracy improvement. A multi-layer feedforward neural networks is trained to acquire the nonlinear relationship between the pulse train and the ground-truth SoC. The experimental data is trained and the results are shown to be promising with less than 2\% SoC estimation error using layer sizes in the range of 10 - 10,000 trained in 0 - 1 million epochs. The testing procedure specifically designed for the proposed technique is explained and provided. The implementation of the proposed strategy is also discussed. The detailed system layout to perform the augmented SoC estimation integrated in the existing active balancing hardware has also been given.

* 8 pages, 12 figures
##### Learning Bodily and Temporal Attention in Protective Movement Behavior Detection

For people with chronic pain (CP), the assessment of protective behavior during physical functioning is essential to understand their subjective pain-related experiences (e.g., fear and anxiety toward pain and injury) and how they deal with such experiences (avoidance or reliance on specific body joints), with the ultimate goal of guiding intervention. Advances in deep learning (DL) can enable the development of such intervention. Using the EmoPain MoCap dataset, we investigate how attention-based DL architectures can be used to improve the detection of protective behavior by capturing the most informative biomechanical cues characterizing specific movements and the strategies used to execute them to cope with pain-related experience. We propose an end-to-end neural network architecture based on attention mechanism, named BodyAttentionNet (BANet). BANet is designed to learn temporal and body-joint regions that are informative to the detection of protective behavior. The approach can consider the variety of ways people execute one movement (including healthy people) and it is independent of the type of movement analyzed. We also explore variants of this architecture to understand the contribution of both temporal and bodily attention mechanisms. Through extensive experiments with other state-of-the-art machine learning techniques used with motion capture data, we show a statistically significant improvement achieved by combining the two attention mechanisms. In addition, the BANet architecture requires a much lower number of parameters than the state-of-the-art ones for comparable if not higher performances.

* 7 pages, 3 figures, 2 tables, submitted to ACII 2019
##### Automatic Detection of Protective Behavior in Chronic Pain Physical Rehabilitation: A Recurrent Neural Network Approach

In chronic pain physical rehabilitation, physiotherapists adapt movement to current performance of patients especially based on the expression of protective behavior, gradually exposing them to feared but harmless and essential everyday movements. As physical rehabilitation moves outside the clinic, physical rehabilitation technology needs to automatically detect such behaviors so as to provide similar personalized support. In this paper, we investigate the use of a Long Short-Term Memory (LSTM) network, which we call Protect-LSTM, to detect events of protective behavior, based on motion capture and electromyography data of healthy people and people with chronic low back pain engaged in five everyday movements. Differently from previous work on the same dataset, we aim to continuously detect protective behavior within a movement rather than overall estimate the presence of such behavior. The Protect-LSTM reaches best average F1 score of 0.815 with leave-one-subject-out (LOSO) validation, using low level features, better than other algorithms. Performances increase for some movements when modelled separately (mean F1 scores: bending=0.77, standing on one leg=0.81, sit-to-stand=0.72, stand-to-sit=0.83, reaching forward=0.67). These results reach excellent level of agreement with the average ratings of physiotherapists. As such, the results show clear potential for in-home technology supported affect-based personalized physical rehabilitation.

* 25 pages, 9 figures, 7 tables, Submitted to IMWUT
##### Protecting Privacy of Users in Brain-Computer Interface Applications

Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on Secure Multiparty Computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving (PP) fashion, i.e.~such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing based SMC in general, namely with 15 players involved in all the computations.