Research papers and code for "Tim Oates":
We propose an off-line approach to explicitly encode temporal patterns spatially as different types of images, namely, Gramian Angular Fields and Markov Transition Fields. This enables the use of techniques from computer vision for feature learning and classification. We used Tiled Convolutional Neural Networks to learn high-level features from individual GAF, MTF, and GAF-MTF images on 12 benchmark time series datasets and two real spatial-temporal trajectory datasets. The classification results of our approach are competitive with state-of-the-art approaches on both types of data. An analysis of the features and weights learned by the CNNs explains why the approach works.

* Submit to JCSS. Preliminary versions are appeared in AAAI 2015 workshop and IJCAI 2016 [arXiv:1506.00327]
Click to Read Paper and Get Code
Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.

* Accepted by IJCAI-2015 ML track
Click to Read Paper and Get Code
The problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One approach to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks based on their performance on a given task. Neural networks are used to learn from a demonstrated expert policy and are evolved to generate a policy similar to the demonstration. The algorithm is discussed and evaluated against competitive feature-based Inverse Reinforcement Learning approaches. At the cost of execution time, neural networks allow for non-linear combinations of features in state evaluations. These valuations may correspond to state value or state reward. This results in better correspondence to observed examples as opposed to using linear combinations. This work also extends existing work on Bayesian Non-Parametric Feature Construction for Inverse Reinforcement Learning by using non-linear combinations of intermediate data to improve performance. The algorithm is observed to be specifically suitable for a linearly solvable non-deterministic Markov Decision Processes in which multiple rewards are sparsely scattered in state space. A conclusive performance hierarchy between evaluated algorithms is presented.

* 12 pages, 15 figures
Click to Read Paper and Get Code
The recent proliferation of publicly available graph-structured data has sparked an interest in machine learning algorithms for graph data. Since most traditional machine learning algorithms assume data to be tabular, embedding algorithms for mapping graph data to real-valued vector spaces has become an active area of research. Existing graph embedding approaches are based purely on structural information and ignore any semantic information from the underlying domain. In this paper, we demonstrate that semantic information can play a useful role in computing graph embeddings. Specifically, we present a framework for devising embedding strategies aware of domain-specific interpretations of graph nodes and edges, and use knowledge of downstream machine learning tasks to identify relevant graph substructures. Using two real-life domains, we show that our framework yields embeddings that are simple to implement and yet achieve equal or greater accuracy in machine learning tasks compared to domain independent approaches.

Click to Read Paper and Get Code
Convolutional Neural Networks have been highly successful in performing a host of computer vision tasks such as object recognition, object detection, image segmentation and texture synthesis. In 2015, Gatys et. al [7] show how the style of a painter can be extracted from an image of the painting and applied to another normal photograph, thus recreating the photo in the style of the painter. The method has been successfully applied to a wide range of images and has since spawned multiple applications and mobile apps. In this paper, the neural style transfer algorithm is applied to fashion so as to synthesize new custom clothes. We construct an approach to personalize and generate new custom clothes based on a users preference and by learning the users fashion choices from a limited set of clothes from their closet. The approach is evaluated by analyzing the generated images of clothes and how well they align with the users fashion style.

* ML4Fashion 2017
Click to Read Paper and Get Code
Traditional approaches to building a large scale knowledge graph have usually relied on extracting information (entities, their properties, and relations between them) from unstructured text (e.g. Dbpedia). Recent advances in Convolutional Neural Networks (CNN) allow us to shift our focus to learning entities and relations from images, as they build robust models that require little or no pre-processing of the images. In this paper, we present an approach to identify and extract spatial relations (e.g., The girl is standing behind the table) from images using CNNs. Our research addresses two specific challenges: providing insight into how spatial relations are learned by the network and which parts of the image are used to predict these relations. We use the pre-trained network VGGNet to extract features from an image and train a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09 dataset to extract spatial relations. The MLP predicts spatial relations without a bounding box around the objects or the space in the image depicting the relation. To understand how the spatial relations are represented in the network, a heatmap is overlayed on the image to show the regions that are deemed important by the network. Also, we analyze the MLP to show the relationship between the activation of consistent groups of nodes and the prediction of a spatial relation. We show how the loss of these groups affects the networks ability to identify relations.

Click to Read Paper and Get Code
We propose a simple but strong baseline for time series classification from scratch with deep neural networks. Our proposed baseline models are pure end-to-end without any heavy preprocessing on the raw data or feature crafting. The proposed Fully Convolutional Network (FCN) achieves premium performance to other state-of-the-art approaches and our exploration of the very deep neural networks with the ResNet structure is also competitive. The global average pooling in our convolutional model enables the exploitation of the Class Activation Map (CAM) to find out the contributing region in the raw data for the specific labels. Our models provides a simple choice for the real world application and a good starting point for the future research. An overall analysis is provided to discuss the generalization capability of our models, learned features, network structures and the classification semantics.

Click to Read Paper and Get Code
This paper proposes a set of new error criteria and learning approaches, Adaptive Normalized Risk-Averting Training (ANRAT), to attack the non-convex optimization problem in training deep neural networks (DNNs). Theoretically, we demonstrate its effectiveness on global and local convexity lower-bounded by the standard $L_p$-norm error. By analyzing the gradient on the convexity index $\lambda$, we explain the reason why to learn $\lambda$ adaptively using gradient descent works. In practice, we show how this method improves training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR-10 datasets. Without using pretraining or other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard ConvNets + MSE/cross entropy. Performance on deep/shallow multilayer perceptrons and Denoised Auto-encoders is also explored. ANRAT can be combined with other quasi-Newton training methods, innovative network variants, regularization techniques and other specific tricks in DNNs. Other than unsupervised pretraining, it provides a new perspective to address the non-convex optimization problem in DNNs.

* AAAI 2016, 0.39%~0.4% ER on MNIST with single 32-32-256-10 ConvNets, code available at https://github.com/cauchyturing/ANRAE
Click to Read Paper and Get Code
We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index $\lambda$ to $-\infty$ for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local optima. Detailed quantitative analysis on both robustness and optimality are provided. The results of proposed experiments on fitting tasks for three noisy non-convex functions and the digits recognition task on the MNIST dataset consolidate the conclusions.

* This paper has been withdrawn by the authors due to some errors and confusions in terminology
Click to Read Paper and Get Code
As the use of cloud computing continues to rise, controlling cost becomes increasingly important. Yet there is evidence that 30\% - 45\% of cloud spend is wasted. Existing tools for cloud provisioning typically rely on highly trained human experts to specify what to monitor, thresholds for triggering action, and actions. In this paper we explore the use of reinforcement learning (RL) to acquire policies to balance performance and spend, allowing humans to specify what they want as opposed to how to do it, minimizing the need for cloud expertise. Empirical results with tabular, deep, and dueling double deep Q-learning with the CloudSim simulator show the utility of RL and the relative merits of the approaches. We also demonstrate effective policy transfer learning from an extremely simple simulator to CloudSim, with the next step being transfer from CloudSim to an Amazon Web Services physical environment.

Click to Read Paper and Get Code
Ubiquitous bio-sensing for personalized health monitoring is slowly becoming a reality with the increasing availability of small, diverse, robust, high fidelity sensors. This oncoming flood of data begs the question of how we will extract useful information from it. In this paper we explore the use of a variety of representations and machine learning algorithms applied to the task of seizure detection in high resolution, multichannel EEG data. We explore classification accuracy, computational complexity and memory requirements with a view toward understanding which approaches are most suitable for such tasks as the number of people involved and the amount of data they produce grows to be quite large. In particular, we show that layered learning approaches such as Deep Belief Networks excel along these dimensions.

* Old draft of AAAI paper, AAAI Spring Symposium Series. 2014
Click to Read Paper and Get Code
We explore the use of neural networks trained with dropout in predicting epileptic seizures from electroencephalographic data (scalp EEG). The input to the neural network is a 126 feature vector containing 9 features for each of the 14 EEG channels obtained over 1-second, non-overlapping windows. The models in our experiments achieved high sensitivity and specificity on patient records not used in the training process. This is demonstrated using leave-one-out-cross-validation across patient records, where we hold out one patient's record as the test set and use all other patients' records for training; repeating this procedure for all patients in the database.

* This paper has been withdrawn by the authors due to an error discovered in the experiments
Click to Read Paper and Get Code
Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a na\"{i}ve propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen learning performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects.

* Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)
Click to Read Paper and Get Code
The use of artificial intelligence in clinical care to improve decision support systems is increasing. This is not surprising since, by its very nature, the practice of medicine consists of making decisions based on observations from different systems both inside and outside the human body. In this paper, we combine three general systems (ICU, diabetes, and comorbidities) and use them to make patient clinical predictions. We use an artificial intelligence approach to show that we can improve mortality prediction of hospitalized diabetic patients. We do this by utilizing a machine learning approach to select clinical input features that are more likely to predict mortality. We then use these features to create a hybrid mortality prediction model and compare our results to non-artificial intelligence models. For simplicity, we limit our input features to patient comorbidities and features derived from a well-known mortality measure, the Sequential Organ Failure Assessment (SOFA).

Click to Read Paper and Get Code
Evolving cybersecurity threats are a persistent challenge for systemadministrators and security experts as new malwares are continu-ally released. Attackers may look for vulnerabilities in commercialproducts or execute sophisticated reconnaissance campaigns tounderstand a targets network and gather information on securityproducts like firewalls and intrusion detection / prevention systems(network or host-based). Many new attacks tend to be modificationsof existing ones. In such a scenario, rule-based systems fail to detectthe attack, even though there are minor differences in conditions /attributes between rules to identify the new and existing attack. Todetect these differences the IDS must be able to isolate the subset ofconditions that are true and predict the likely conditions (differentfrom the original) that must be observed. In this paper, we proposeaprobabilistic abductive reasoningapproach that augments an exist-ing rule-based IDS (snort [29]) to detect these evolved attacks by (a)Predicting rule conditions that are likely to occur (based on existingrules) and (b) able to generate new snort rules when provided withseed rule (i.e. a starting rule) to reduce the burden on experts toconstantly update them. We demonstrate the effectiveness of theapproach by generating new rules from the snort 2012 rules set andtesting it on the MACCDC 2012 dataset [6].

Click to Read Paper and Get Code
In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very expensive and time consuming. Autoencoders are deep neural network used to compress high dimensional data such as pixelated images into small latent representations. This compression model is vital to efficiently learn policies, especially when learning on embedded systems. We have implemented this model on the NVIDIA Jetson TX2 embedded GPU, and evaluated the power consumption, throughput, and energy consumption of the autoencoders for various CPU/GPU core combinations, frequencies, and model parameters. Additionally, we have shown the reconstructions generated by the autoencoder to analyze the quality of the generated compressed representation and also the performance of the reinforcement learning agent. Finally, we have presented an assessment of the viability of training these models on embedded systems and their usefulness in developing autonomous policies. Using autoencoders, we were able to achieve 4-5 $\times$ improved performance compared to a baseline RL agent with a convolutional feature extractor, while using less than 2W of power.

Click to Read Paper and Get Code
There is renewed interest in formulating integration as an inference problem, motivated by obtaining a full distribution over numerical error that can be propagated through subsequent computation. Current methods, such as Bayesian Quadrature, demonstrate impressive empirical performance but lack theoretical analysis. An important challenge is to reconcile these probabilistic integrators with rigorous convergence guarantees. In this paper, we present the first probabilistic integrator that admits such theoretical treatment, called Frank-Wolfe Bayesian Quadrature (FWBQ). Under FWBQ, convergence to the true value of the integral is shown to be exponential and posterior contraction rates are proven to be superexponential. In simulations, FWBQ is competitive with state-of-the-art methods and out-performs alternatives based on Frank-Wolfe optimisation. Our approach is applied to successfully quantify numerical error in the solution to a challenging model choice problem in cellular biology.

* Advances in Neural Information Processing Systems 28, 1162--1170, 2015
Click to Read Paper and Get Code
This paper considers the problem of estimating the structure of multiple related directed acyclic graph (DAG) models. Building on recent developments in exact estimation of DAGs using integer linear programming (ILP), we present an ILP approach for joint estimation over multiple DAGs, that does not require that the vertices in each DAG share a common ordering. Furthermore, we allow also for (potentially unknown) dependency structure between the DAGs. Results are presented on both simulated data and fMRI data obtained from multiple subjects.

* Revised version - 12/11/14
Click to Read Paper and Get Code
The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justified. As a result, quadrature error can introduce bias into estimated moments, for which there is no compensatory mechanism in the classical sigma-point filters. This can lead in turn to estimates and predictions that are poorly calibrated. In this article, we investigate the Bayes-Sard quadrature method in the context of sigma-point filters, which enables uncertainty due to quadrature error to be formalised within a probabilistic model. Our first contribution is to derive the well-known classical quadratures as special cases of the Bayes-Sard quadrature method. Then a general-purpose moment transform is developed and utilised in the design of novel sigma-point filters, so that uncertainty due to quadrature error is explicitly quantified. Numerical experiments on a challenging tracking example with misspecified initial conditions show that the additional uncertainty quantification built into our method leads to better-calibrated state estimates with improved RMSE.

* 13 pages, 4 figures
Click to Read Paper and Get Code
The dendritic cell algorithm is an immune-inspired technique for processing time-dependant data. Here we propose it as a possible solution for a robotic classification problem. The dendritic cell algorithm is implemented on a real robot and an investigation is performed into the effects of varying the migration threshold median for the cell population. The algorithm performs well on a classification task with very little tuning. Ways of extending the implementation to allow it to be used as a classifier within the field of robotic security are suggested.

* Proceedings of the 6th International Conference on Artificial Immune Systems (ICARIS2007), Lecture Notes in Computer Science 4628, Santos, Brazil, p204-215, 2007
* 12 pages, 4 figures, Proceedings of the 6th International Conference on Artificial Immune Systems (ICARIS2007)
Click to Read Paper and Get Code