Models, code, and papers for "David A":
Continual learning consists of algorithms that learn from a stream of data/tasks continuously and adaptively thought time, enabling the incremental development of ever more complex knowledge and skills. The lack of consensus in evaluating continual learning algorithms and the almost exclusive focus on forgetting motivate us to propose a more comprehensive set of implementation independent metrics accounting for several factors we believe have practical implications worth considering in the deployment of real AI systems that learn continually: accuracy or performance over time, backward and forward knowledge transfer, memory overhead as well as computational efficiency. Drawing inspiration from the standard Multi-Attribute Value Theory (MAVT) we further propose to fuse these metrics into a single score for ranking purposes and we evaluate our proposal with five continual learning strategies on the iCIFAR-100 continual learning benchmark.
One of the most prominent challenges in clustering is "the user's dilemma," which is the problem of selecting an appropriate clustering algorithm for a specific task. A formal approach for addressing this problem relies on the identification of succinct, user-friendly properties that formally capture when certain clustering methods are preferred over others. Until now these properties focused on advantages of classical Linkage-Based algorithms, failing to identify when other clustering paradigms, such as popular center-based methods, are preferable. We present surprisingly simple new properties that delineate the differences between common clustering paradigms, which clearly and formally demonstrates advantages of center-based approaches for some applications. These properties address how sensitive algorithms are to changes in element frequencies, which we capture in a generalized setting where every element is associated with a real-valued weight.
We carefully study how well minimizing convex surrogate loss functions, corresponds to minimizing the misclassification error rate for the problem of binary classification with linear predictors. In particular, we show that amongst all convex surrogate losses, the hinge loss gives essentially the best possible bound, of all convex loss functions, for the misclassification error rate of the resulting linear predictor in terms of the best possible margin error rate. We also provide lower bounds for specific convex surrogates that show how different commonly used losses qualitatively differ from each other.
We define a novel, basic, unsupervised learning problem - learning the lowest density homogeneous hyperplane separator of an unknown probability distribution. This task is relevant to several problems in machine learning, such as semi-supervised learning and clustering stability. We investigate the question of existence of a universally consistent algorithm for this problem. We propose two natural learning paradigms and prove that, on input unlabeled random samples generated by any member of a rich family of distributions, they are guaranteed to converge to the optimal separator for that distribution. We complement this result by showing that no learning algorithm for our task can achieve uniform learning rates (that are independent of the data generating distribution).
Continual learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once. The evolution of the learning process is modeled by a sequence of learning experiences where the goal is to be able to learn new skills all along the sequence without forgetting what has been previously learned. Continual learning also aims at the same time at optimizing the memory, the computation power and the speed during the learning process. An important challenge for machine learning is not necessarily finding solutions that work in the real world but rather finding stable algorithms that can learn in real world. Hence, the ideal approach would be tackling the real world in a embodied platform: an autonomous agent. Continual learning would then be effective in an autonomous agent or robot, which would learn autonomously through time about the external world, and incrementally develop a set of complex skills and knowledge. Robotic agents have to learn to adapt and interact with their environment using a continuous stream of observations. Some recent approaches aim at tackling continual learning for robotics, but most recent papers on continual learning only experiment approaches in simulation or with static datasets. Unfortunately, the evaluation of those algorithms does not provide insights on whether their solutions may help continual learning in the context of robotics. This paper aims at reviewing the existing state of the art of continual learning, summarizing existing benchmarks and metrics, and proposing a framework for presenting and evaluating both robotics and non robotics approaches in a way that makes transfer between both fields easier.
Traditional approaches to stereo visual SLAM rely on point features to estimate the camera trajectory and build a map of the environment. In low-textured environments, though, it is often difficult to find a sufficient number of reliable point features and, as a consequence, the performance of such algorithms degrades. This paper proposes PL-SLAM, a stereo visual SLAM system that combines both points and line segments to work robustly in a wider variety of scenarios, particularly in those where point features are scarce or not well-distributed in the image. PL-SLAM leverages both points and segments at all the instances of the process: visual odometry, keyframe selection, bundle adjustment, etc. We contribute also with a loop closure procedure through a novel bag-of-words approach that exploits the combined descriptive power of the two kinds of features. Additionally, the resulting map is richer and more diverse in 3D elements, which can be exploited to infer valuable, high-level scene structures like planes, empty spaces, ground plane, etc. (not addressed in this work). Our proposal has been tested with several popular datasets (such as KITTI and EuRoC), and is compared to state of the art methods like ORB-SLAM, revealing a more robust performance in most of the experiments, while still running in real-time. An open source version of the PL-SLAM C++ code will be released for the benefit of the community.
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.
To probe the largest public-domain email database for indicators of fraud, we apply machine learning and accomplish four investigative tasks. First, we identify persons of interest (POI), using financial records and email, and report a peak accuracy of 95.7%. Secondly, we find any publicly exposed personally identifiable information (PII) and discover 50,000 previously unreported instances. Thirdly, we automatically flag legally responsive emails as scored by human experts in the California electricity blackout lawsuit, and find a peak 99% accuracy. Finally, we track three years of primary topics and sentiment across over 10,000 unique people before, during and after the onset of the corporate crisis. Where possible, we compare accuracy against execution times for 51 algorithms and report human-interpretable business rules that can scale to vast datasets.
Understanding the pathways whereby an intervention has an effect on an outcome is a common scientific goal. A rich body of literature provides various decompositions of the total intervention effect into pathway specific effects. Interventional direct and indirect effects provide one such decomposition. Existing estimators of these effects are based on parametric models with confidence interval estimation facilitated via the nonparametric bootstrap. We provide theory that allows for more flexible, possibly machine learning-based, estimation techniques to be considered. In particular, we establish weak convergence results that facilitate the construction of closed-form confidence intervals and hypothesis tests. Finally, we demonstrate multiple robustness properties of the proposed estimators. Simulations show that inference based on large-sample theory has adequate small-sample performance. Our work thus provides a means of leveraging modern statistical learning techniques in estimation of interventional mediation effects.
We introduce a general learning framework for private machine learning based on randomised response. Our assumption is that all actors are potentially adversarial and as such we trust only to release a single noisy version of an individual's datapoint. We discuss a general approach that forms a consistent way to estimate the true underlying machine learning model and demonstrate this in the case of logistic regression.
This work describes an outlier detection procedure (named "OutlierTree") loosely based on the GritBot software developed by RuleQuest research, which works by evaluating and following supervised decision tree splits on variables, in whose branches 1-d confidence intervals are constructed for the target variable and potential outliers flagged according to these confidence intervals. Under this logic, it's possible to produce human-readable explanations for why a given value of a variable in an observation can be considered as outlier, by considering the decision tree branch conditions along with general distribution statistics among the non-outlier observations that fell into the same branch, which can then be contrasted against the value which lies outside the CI. The supervised splits help to ensure that the generated conditions are not spurious, but rather related to the target variable and having logical breakpoints.
We introduce a Gaussian process-based model for handling of non-stationarity. The warping is achieved non-parametrically, through imposing a prior on the relative change of distance between subsequent observation inputs. The model allows the use of general gradient optimization algorithms for training and incurs only a small computational overhead on training and prediction. The model finds its applications in forecasting in non-stationary time series with either gradually varying volatility, presence of change points, or a combination thereof. We evaluate the model on synthetic and real-world time series data comparing against both baseline and known state-of-the-art approaches and show that the model exhibits state-of-the-art forecasting performance at a lower implementation and computation cost.
This work briefly explores the possibility of approximating spatial distance (alternatively, similarity) between data points using the Isolation Forest method envisioned for outlier detection. The logic is similar to that of isolation: the more similar or closer two points are, the more random splits it will take to separate them. The separation depth between two points can be standardized in the same way as the isolation depth, transforming it into a distance metric that is limited in range, centered, and in compliance with the axioms of distance. This metric presents some desirable properties such as being invariant to the scales of variables or being able to account for non-linear relationships between variables, which other metrics such as Euclidean or Mahalanobis distance do not. Extensions to the Isolation Forest method are also proposed for handling categorical variables and missing values, resulting in a more generalizable and robust metric.
This work proposes a non-iterative strategy for missing value imputations which is guided by similarity between observations, but instead of explicitly determining distances or nearest neighbors, it assigns observations to overlapping buckets through recursive semi-random hyperplane cuts, in which weighted averages are determined as imputations for each variable. The quality of these imputations is oftentimes not as good as that of chained equations, but the proposed technique is much faster, non-iterative, can make imputations on new data without re-calculating anything, and scales easily to large and high-dimensional datasets, providing a significant boost over simple mean/median imputation in regression and classification metrics with imputed values when other methods are not feasible.
Normative expert systems have not become commonplace because they have been difficult to build and use. Over the past decade, however, researchers have developed the influence diagram, a graphical representation of a decision maker's beliefs, alternatives, and preferences that serves as the knowledge base of a normative expert system. Most people who have seen the representation find it intuitive and easy to use. Consequently, the influence diagram has overcome significantly the barriers to constructing normative expert systems. Nevertheless, building influence diagrams is not practical for extremely large and complex domains. In this book, I address the difficulties associated with the construction of the probabilistic portion of an influence diagram, called a knowledge map, belief network, or Bayesian network. I introduce two representations that facilitate the generation of large knowledge maps. In particular, I introduce the similarity network, a tool for building the network structure of a knowledge map, and the partition, a tool for assessing the probabilities associated with a knowledge map. I then use these representations to build Pathfinder, a large normative expert system for the diagnosis of lymph-node diseases (the domain contains over 60 diseases and over 100 disease findings). In an early version of the system, I encoded the knowledge of the expert using an erroneous assumption that all disease findings were independent, given each disease. When the expert and I attempted to build a more accurate knowledge map for the domain that would capture the dependencies among the disease findings, we failed. Using a similarity network, however, we built the knowledge-map structure for the entire domain in approximately 40 hours. Furthermore, the partition representation reduced the number of probability assessments required by the expert from 75,000 to 14,000.
Recent state-of-the-art video generation systems employ Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs) to produce novel videos. However, VAE models typically produce blurry outputs when faced with sub-optimal conditioning of the input, and GANs are known to be unstable for large output sizes. In addition, the output videos of these models are difficult to evaluate, partly because the GAN loss function is not an accurate measure of convergence. In this work, we propose using a state-of-the-art neural flow generator called Glow to generate videos conditioned on a textual label, one frame at a time. Neural flow models are more stable than standard GANs, as they only optimize a single cross entropy loss function, which is monotonic and avoids the circular convergence issues of the GAN minimax objective. In addition, we also show how to condition Glow on external context, while still preserving the invertible nature of each "flow" layer. Finally, we evaluate the proposed Glow model by calculating cross entropy on a held-out validation set of videos, in order to compare multiple versions of the proposed model via an ablation study. We show generated videos and discuss future improvements.
Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting.
Where early work on dialogue in Computational Linguistics put much emphasis on dialogue structure and its relation to the mental states of the dialogue participants (e.g., Allen 1979, Grosz & Sidner 1986), current work mostly reduces dialogue to the task of producing at any one time a next utterance; e.g. in neural chatbot or Visual Dialogue settings. As a methodological decision, this is sound: Even the longest journey is a sequence of steps. It becomes detrimental, however, when the tasks and datasets from which dialogue behaviour is to be learned are tailored too much to this framing of the problem. In this short note, we describe a family of settings which still allow to keep dialogues simple, but add a constraint that makes participants care about reaching mutual understanding. In such agreement games, there is a secondary, but explicit goal besides the task level goal, and that is to reach mutual understanding about whether the task level goal has been reached. As we argue, this naturally triggers meta-semantic interaction and mutual engagement, and hence leads to richer data from which to induce models.