We propose to use boosted regression trees as a way to compute human-interpretable solutions to reinforcement learning problems. Boosting combines several regression trees to improve their accuracy without significantly reducing their inherent interpretability. Prior work has focused independently on reinforcement learning and on interpretable machine learning, but there has been little progress in interpretable reinforcement learning. Our experimental results show that boosted regression trees compute solutions that are both interpretable and match the quality of leading reinforcement learning methods.

Click to Read Paper
We explore several oversampling techniques for an imbalanced multi-label classification problem, a setting often encountered when developing models for Computer-Aided Diagnosis (CADx) systems. While most CADx systems aim to optimize classifiers for overall accuracy without considering the relative distribution of each class, we look into using synthetic sampling to increase per-class performance when predicting the degree of malignancy. Using low-level image features and a random forest classifier, we show that using synthetic oversampling techniques increases the sensitivity of the minority classes by an average of 7.22% points, with as much as a 19.88% point increase in sensitivity for a particular minority class. Furthermore, the analysis of low-level image feature distributions for the synthetic nodules reveals that these nodules can provide insights on how to preprocess image data for better classification performance or how to supplement the original datasets when more data acquisition is feasible.

* 5 pages, 3 figures, 4 Tables, KDD MLMH'18 Workshop
Click to Read Paper
Many industries are now investing heavily in data science and automation to replace manual tasks and/or to help with decision making, especially in the realm of leveraging computer vision to automate many monitoring, inspection, and surveillance tasks. This has resulted in the emergence of the 'data scientist' who is conversant in statistical thinking, machine learning (ML), computer vision, and computer programming. However, as ML becomes more accessible to the general public and more aspects of ML become automated, applications leveraging computer vision are increasingly being created by non-experts with less opportunity for regulatory oversight. This points to the overall need for more educated responsibility for these lay-users of usable ML tools in order to mitigate potentially unethical ramifications. In this paper, we undertake a SWOT analysis to study the strengths, weaknesses, opportunities, and threats of building usable ML tools for mass adoption for important areas leveraging ML such as computer vision. The paper proposes a set of data science literacy criteria for educating and supporting lay-users in the responsible development and deployment of ML applications.

* 4 pages
Click to Read Paper
CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure. This technical report is structured as follows. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software. Section 2 presents the core functionalities of the library: namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks. Section 3 describes how to report benchmark results using the library. Section 4 describes the versioning system.

* Technical report for https://github.com/tensorflow/cleverhans
Click to Read Paper
This report describes eighteen projects that explored how commercial cloud computing services can be utilized for scientific computation at national laboratories. These demonstrations ranged from deploying proprietary software in a cloud environment to leveraging established cloud-based analytics workflows for processing scientific datasets. By and large, the projects were successful and collectively they suggest that cloud computing can be a valuable computational resource for scientific computation at national laboratories.

Click to Read Paper
Particle Swarm Optimization (PSO) is a nature-inspired meta-heuristic for solving continuous optimization problems. In the literature, the potential of the particles of swarm has been used to show that slightly modified PSO guarantees convergence to local optima. Here we show that under specific circumstances the unmodified PSO, even with swarm parameters known (from the literature) to be good, almost surely does not yield convergence to a local optimum is provided. This undesirable phenomenon is called stagnation. For this purpose, the particles' potential in each dimension is analyzed mathematically. Additionally, some reasonable assumptions on the behavior if the particles' potential are made. Depending on the objective function and, interestingly, the number of particles, the potential in some dimensions may decrease much faster than in other dimensions. Therefore, these dimensions lose relevance, i.e., the contribution of their entries to the decisions about attractor updates becomes insignificant and, with positive probability, they never regain relevance. If Brownian Motion is assumed to be an approximation of the time-dependent drop of potential, practical, i.e., large values for this probability are calculated. Finally, on chosen multidimensional polynomials of degree two, experiments are provided showing that the required circumstances occur quite frequently. Furthermore, experiments are provided showing that even when the very simple sphere function is processed the described stagnation phenomenon occurs. Consequently, unmodified PSO does not converge to any local optimum of the chosen functions for tested parameter settings.

* Full version of poster on Genetic and Evolutionary Computation Conference (GECCO) 15
Click to Read Paper