Models, code, and papers for "Travis J":

The Ant Swarm Neuro-Evolution Procedure for Optimizing Recurrent Networks

Sep 27, 2019
AbdElRahman A. ElSaid, Alexander G. Ororbia, Travis J. Desell

Hand-crafting effective and efficient structures for recurrent neural networks (RNNs) is a difficult, expensive, and time-consuming process. To address this challenge, we propose a novel neuro-evolution algorithm based on ant colony optimization (ACO), called ant swarm neuro-evolution (ASNE), for directly optimizing RNN topologies. The procedure selects from multiple modern recurrent cell types such as Delta-RNN, GRU, LSTM, MGU and UGRNN cells, as well as recurrent connections which may span multiple layers and/or steps of time. In order to introduce an inductive bias that encourages the formation of sparser synaptic connectivity patterns, we investigate several variations of the core algorithm. We do so primarily by formulating different functions that drive the underlying pheromone simulation process (which mimic L1 and L2 regularization in standard machine learning) as well as by introducing ant agents with specialized roles (inspired by how real ant colonies operate), i.e., explorer ants that construct the initial feed forward structure and social ants which select nodes from the feed forward connections to subsequently craft recurrent memory structures. We also incorporate a Lamarckian strategy for weight initialization which reduces the number of backpropagation epochs required to locally train candidate RNNs, speeding up the neuro-evolution process. Our results demonstrate that the sparser RNNs evolved by ASNE significantly outperform traditional one and two layer architectures consisting of modern memory cells, as well as the well-known NEAT algorithm. Furthermore, we improve upon prior state-of-the-art results on the time series dataset utilized in our experiments.

* 15 pages, 22 pages appendix 

  Access Model/Code and Paper
An Empirical Exploration of Deep Recurrent Connections and Memory Cells Using Neuro-Evolution

Sep 27, 2019
Travis J. Desell, AbdElRahman A. ElSaid, Alexander G. Ororbia

Neuro-evolution and neural architecture search algorithms have gained increasing interest due to the challenges involved in designing optimal artificial neural networks (ANNs). While these algorithms have been shown to possess the potential to outperform the best human crafted architectures, a less common use of them is as a tool for analysis of ANN structural components and connectivity structures. In this work, we focus on this particular use-case to develop a rigorous examination and comparison framework for analyzing recurrent neural networks (RNNs) applied to time series prediction using the novel neuro-evolutionary process known as Evolutionary eXploration of Augmenting Memory Models (EXAMM). Specifically, we use our EXAMM-based analysis to investigate the capabilities of recurrent memory cells and the generalization ability afforded by various complex recurrent connectivity patterns that span one or more steps in time, i.e., deep recurrent connections. EXAMM, in this study, was used to train over 10.56 million RNNs in 5,280 repeated experiments with varying components. While many modern, often hand-crafted RNNs rely on complex memory cells (which have internal recurrent connections that only span a single time step) operating under the assumption that these sufficiently latch information and handle long term dependencies, our results show that networks evolved with deep recurrent connections perform significantly better than those without. More importantly, in some cases, the best performing RNNs consisted of only simple neurons and deep time skip connections, without any memory cells. These results strongly suggest that utilizing deep time skip connections in RNNs for time series data prediction not only deserves further, dedicated study, but also demonstrate the potential of neuro-evolution as a means to better study, understand, and train effective RNNs.

* 14 pages 

  Access Model/Code and Paper
Multi-Objective Optimization for Size and Resilience of Spiking Neural Networks

Feb 04, 2020
Mihaela Dimovska, Travis Johnston, Catherine D. Schuman, J. Parker Mitchell, Thomas E. Potok

Inspired by the connectivity mechanisms in the brain, neuromorphic computing architectures model Spiking Neural Networks (SNNs) in silicon. As such, neuromorphic architectures are designed and developed with the goal of having small, low power chips that can perform control and machine learning tasks. However, the power consumption of the developed hardware can greatly depend on the size of the network that is being evaluated on the chip. Furthermore, the accuracy of a trained SNN that is evaluated on chip can change due to voltage and current variations in the hardware that perturb the learned weights of the network. While efforts are made on the hardware side to minimize those perturbations, a software based strategy to make the deployed networks more resilient can help further alleviate that issue. In this work, we study Spiking Neural Networks in two neuromorphic architecture implementations with the goal of decreasing their size, while at the same time increasing their resiliency to hardware faults. We leverage an evolutionary algorithm to train the SNNs and propose a multiobjective fitness function to optimize the size and resiliency of the SNN. We demonstrate that this strategy leads to well-performing, small-sized networks that are more resilient to hardware faults.

* Will appear in proceedings of 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE Catalog Number: CFP19G31-USB ISBN: 978-1-7281-3884-8 pg. 431-438 

  Access Model/Code and Paper
A Unified Framework for Sparse Relaxed Regularized Regression: SR3

Sep 13, 2018
Peng Zheng, Travis Askham, Steven L. Brunton, J. Nathan Kutz, Aleksandr Y. Aravkin

Regularized regression problems are ubiquitous in statistical modeling, signal processing, and machine learning. Sparse regression in particular has been instrumental in scientific model discovery, including compressed sensing applications, variable selection, and high-dimensional analysis. We propose a broad framework for sparse relaxed regularized regression, called SR3. The key idea is to solve a relaxation of the regularized problem, which has three advantages over the state-of-the-art: (1) solutions of the relaxed problem are superior with respect to errors, false positives, and conditioning, (2) relaxation allows extremely fast algorithms for both convex and nonconvex formulations, and (3) the methods apply to composite regularizers such as total variation (TV) and its nonconvex variants. We demonstrate the advantages of SR3 (computational efficiency, higher accuracy, faster convergence rates, greater flexibility) across a range of regularized regression problems with synthetic and real data, including applications in compressed sensing, LASSO, matrix completion, TV regularization, and group sparsity. To promote reproducible research, we also provide a companion Matlab package that implements these examples.

* 15 pages, 12 figures 

  Access Model/Code and Paper
High Resolution Medical Image Analysis with Spatial Partitioning

Sep 12, 2019
Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

Medical images such as 3D computerized tomography (CT) scans and pathology images, have hundreds of millions or billions of voxels/pixels. It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work. Existing image analysis approaches alleviate this problem by cropping or down-sampling input images, which leads to complicated implementation and sub-optimal performance due to information loss. In this paper, we implement spatial partitioning, which internally distributes the input and output of convolutional layers across GPUs/TPUs. Our implementation is based on the Mesh-TensorFlow framework and the computation distribution is transparent to end users. With this technique, we train a 3D Unet on up to 512 by 512 by 512 resolution data. To the best of our knowledge, this is the first work for handling such high resolution images end-to-end.

  Access Model/Code and Paper
Lung Cancer Detection using Co-learning from Chest CT Images and Clinical Demographics

Feb 21, 2019
Jiachen Wang, Riqiang Gao, Yuankai Huo, Shunxing Bao, Yunxi Xiong, Sanja L. Antic, Travis J. Osterman, Pierre P. Massion, Bennett A. Landman

Early detection of lung cancer is essential in reducing mortality. Recent studies have demonstrated the clinical utility of low-dose computed tomography (CT) to detect lung cancer among individuals selected based on very limited clinical information. However, this strategy yields high false positive rates, which can lead to unnecessary and potentially harmful procedures. To address such challenges, we established a pipeline that co-learns from detailed clinical demographics and 3D CT images. Toward this end, we leveraged data from the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL), which focuses on early detection of lung cancer. A 3D attention-based deep convolutional neural net (DCNN) is proposed to identify lung cancer from the chest CT scan without prior anatomical location of the suspicious nodule. To improve upon the non-invasive discrimination between benign and malignant, we applied a random forest classifier to a dataset integrating clinical information to imaging data. The results show that the AUC obtained from clinical demographics alone was 0.635 while the attention network alone reached an accuracy of 0.687. In contrast when applying our proposed pipeline integrating clinical and imaging variables, we reached an AUC of 0.787 on the testing dataset. The proposed network both efficiently captures anatomical information for classification and also generates attention maps that explain the features that drive performance.

* SPIE Medical Image, oral presentation 

  Access Model/Code and Paper
Exascale Deep Learning to Accelerate Cancer Research

Sep 26, 2019
Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz

Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.

* Submitted to IEEE Big Data 

  Access Model/Code and Paper
Accelerating the Evolution of Convolutional Neural Networks with Node-Level Mutations and Epigenetic Weight Initialization

Nov 17, 2018
Travis Desell

This paper examines three generic strategies for improving the performance of neuro-evolution techniques aimed at evolving convolutional neural networks (CNNs). These were implemented as part of the Evolutionary eXploration of Augmenting Convolutional Topologies (EXACT) algorithm. EXACT evolves arbitrary convolutional neural networks (CNNs) with goals of better discovering and understanding new effective architectures of CNNs for machine learning tasks and to potentially automate the process of network design and selection. The strategies examined are node-level mutation operations, epigenetic weight initialization and pooling connections. Results were gathered over the period of a month using a volunteer computing project, where over 225,000 CNNs were trained and evaluated across 16 different EXACT searches. The node mutation operations where shown to dramatically improve evolution rates over traditional edge mutation operations (as used by the NEAT algorithm), and epigenetic weight initialization was shown to further increase the accuracy and generalizability of the trained CNNs. As a negative but interesting result, allowing for pooling connections was shown to degrade the evolution progress. The best trained CNNs reached 99.46% accuracy on the MNIST test data in under 13,500 CNN evaluations -- accuracy comparable with some of the best human designed CNNs.

* arXiv admin note: text overlap with arXiv:1703.05422 

  Access Model/Code and Paper
Large Scale Evolution of Convolutional Neural Networks Using Volunteer Computing

Mar 15, 2017
Travis Desell

This work presents a new algorithm called evolutionary exploration of augmenting convolutional topologies (EXACT), which is capable of evolving the structure of convolutional neural networks (CNNs). EXACT is in part modeled after the neuroevolution of augmenting topologies (NEAT) algorithm, with notable exceptions to allow it to scale to large scale distributed computing environments and evolve networks with convolutional filters. In addition to multithreaded and MPI versions, EXACT has been implemented as part of a BOINC volunteer computing project, allowing large scale evolution. During a period of two months, over 4,500 volunteered computers on the Citizen Science Grid trained over 120,000 CNNs and evolved networks reaching 98.32% test data accuracy on the MNIST handwritten digits dataset. These results are even stronger as the backpropagation strategy used to train the CNNs was fairly rudimentary (ReLU units, L2 regularization and Nesterov momentum) and these were initial test runs done without refinement of the backpropagation hyperparameters. Further, the EXACT evolutionary strategy is independent of the method used to train the CNNs, so they could be further improved by advanced techniques like elastic distortions, pretraining and dropout. The evolved networks are also quite interesting, showing "organic" structures and significant differences from standard human designed architectures.

* 17 pages, 13 figures. Submitted to the 2017 Genetic and Evolutionary Computation Conference (GECCO 2017) 

  Access Model/Code and Paper
Quantum Computing based Hybrid Solution Strategies for Large-scale Discrete-Continuous Optimization Problems

Oct 29, 2019
Akshay Ajagekar, Travis Humble, Fengqi You

Quantum computing (QC) has gained popularity due to its unique capabilities that are quite different from that of classical computers in terms of speed and methods of operations. This paper proposes hybrid models and methods that effectively leverage the complementary strengths of deterministic algorithms and QC techniques to overcome combinatorial complexity for solving large-scale mixed-integer programming problems. Four applications, namely the molecular conformation problem, job-shop scheduling problem, manufacturing cell formation problem, and the vehicle routing problem, are specifically addressed. Large-scale instances of these application problems across multiple scales ranging from molecular design to logistics optimization are computationally challenging for deterministic optimization algorithms on classical computers. To address the computational challenges, hybrid QC-based algorithms are proposed and extensive computational experimental results are presented to demonstrate their applicability and efficiency. The proposed QC-based solution strategies enjoy high computational efficiency in terms of solution quality and computation time, by utilizing the unique features of both classical and quantum computers.

  Access Model/Code and Paper
Bridging the Knowledge Gap: Enhancing Question Answering with World and Domain Knowledge

Oct 16, 2019
Travis R. Goodwin, Dina Demner-Fushman

In this paper we present OSCAR (Ontology-based Semantic Composition Augmented Regularization), a method for injecting task-agnostic knowledge from an Ontology or knowledge graph into a neural network during pretraining. We evaluated the impact of including OSCAR when pretraining BERT with Wikipedia articles by measuring the performance when fine-tuning on two question answering tasks involving world knowledge and causal reasoning and one requiring domain (healthcare) knowledge and obtained 33:3%, 18:6%, and 4% improved accuracy compared to pretraining BERT without OSCAR and obtaining new state-of-the-art results on two of the tasks.

* 6 pages, 5 figures, 2 tables 

  Access Model/Code and Paper
Heterogeneous Robot Teams for Informative Sampling

Jun 17, 2019
Travis Manderson, Sandeep Manjanna, Gregory Dudek

In this paper we present a cooperative multi-robot strategy to adaptively explore and sample environments that are unfavorable for humans. We propose a methodology for a team of heterogeneous robots to collaborate on information based planning for applications like sampling thermal imagery in a wildfire affected site to assist with detecting spot fires and areas of residual fires, fire mapping and monitoring fire progression or applications in marine domain for coral reef monitoring and survey. We use Gabor filter based texture classifier on aerial images from an Unmanned Aerial Vehicle (UAV) to segment the region of interest into classes. A policy gradient based path planner is used on the texture classified aerial image to plan a path for the Unmanned Ground Vehicle (UGV). The UGV then uses a local planner to reach the goals set by the global planner by avoiding obstacles. The UGV also learns the labels for the segmented classes as drivable and non-drivable using the feedback from the performance while reaching the planned waypoints. We evaluated the building blocks of our approach and present the results with application of these strategies to different domains.

* 6 pages, 6 figures, 2019 Workshop on Informative Path Planning and Adaptive Sampling at Robotics Science and Systems 

  Access Model/Code and Paper
BodyDigitizer: An Open Source Photogrammetry-based 3D Body Scanner

Oct 28, 2017
Travis Gesslein, Daniel Scherer, Jens Grubert

With the rising popularity of Augmented and Virtual Reality, there is a need for representing humans as virtual avatars in various application domains ranging from remote telepresence, games to medical applications. Besides explicitly modelling 3D avatars, sensing approaches that create person-specific avatars are becoming popular. However, affordable solutions typically suffer from a low visual quality and professional solution are often too expensive to be deployed in nonprofit projects. We present an open-source project, BodyDigitizer, which aims at providing both build instructions and configuration software for a high-resolution photogrammetry-based 3D body scanner. Our system encompasses up to 96 Rasperry PI cameras, active LED lighting, a sturdy frame construction and open-source configuration software. %We demonstrate the applicability of the body scanner in a nonprofit Mixed Reality health project. The detailed build instruction and software are available at

* changed template, minor modifications for camera ready version 

  Access Model/Code and Paper
Data-Driven Clustering via Parameterized Lloyd's Families

Sep 19, 2018
Maria-Florina Balcan, Travis Dick, Colin White

Algorithms for clustering points in metric spaces is a long-studied area of research. Clustering has seen a multitude of work both theoretically, in understanding the approximation guarantees possible for many objective functions such as k-median and k-means clustering, and experimentally, in finding the fastest algorithms and seeding procedures for Lloyd's algorithm. The performance of a given clustering algorithm depends on the specific application at hand, and this may not be known up front. For example, a "typical instance" may vary depending on the application, and different clustering heuristics perform differently depending on the instance. In this paper, we define an infinite family of algorithms generalizing Lloyd's algorithm, with one parameter controlling the the initialization procedure, and another parameter controlling the local search procedure. This family of algorithms includes the celebrated k-means++ algorithm, as well as the classic farthest-first traversal algorithm. We design efficient learning algorithms which receive samples from an application-specific distribution over clustering instances and learn a near-optimal clustering algorithm from the class. We show the best parameters vary significantly across datasets such as MNIST, CIFAR, and mixtures of Gaussians. Our learned algorithms never perform worse than k-means++, and on some datasets we see significant improvements.

  Access Model/Code and Paper
Feature Generation for Robust Semantic Role Labeling

Feb 22, 2017
Travis Wolfe, Mark Dredze, Benjamin Van Durme

Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create. In this work we describe how to automatically generate rich feature sets from simple units called featlets, requiring less engineering. Using information gain to guide the generation process, we train models which rival the state of the art on two standard Semantic Role Labeling datasets with almost no task or linguistic insight.

  Access Model/Code and Paper
Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images

Mar 05, 2020
Avrim Blum, Travis Dick, Naren Manoj, Hongyang Zhang

We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $\epsilon$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $\ell_\infty$ threat model. In this work, we show that any noise distribution $\mathcal{D}$ over $\mathbb{R}^d$ that provides $\ell_p$ robustness for all base classifiers with $p>2$ must satisfy $\mathbb{E}\eta_i^2=\Omega(d^{1-2/p}\epsilon^2(1-\delta)/\delta^2)$ for 99% of the features (pixels) of vector $\eta\sim\mathcal{D}$, where $\epsilon$ is the robust radius and $\delta$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.

* 20 pages, 2 figures; Code is available at 

  Access Model/Code and Paper