Models, code, and papers for "David Bau":

Dissecting Pruned Neural Networks

Jun 29, 2019
Jonathan Frankle, David Bau

Pruning is a standard technique for removing unnecessary structure from a neural network to reduce its storage footprint, computational demands, or energy consumption. Pruning can reduce the parameter-counts of many state-of-the-art neural networks by an order of magnitude without compromising accuracy, meaning these networks contain a vast amount of unnecessary structure. In this paper, we study the relationship between pruning and interpretability. Namely, we consider the effect of removing unnecessary structure on the number of hidden units that learn disentangled representations of human-recognizable concepts as identified by network dissection. We aim to evaluate how the interpretability of pruned neural networks changes as they are compressed. We find that pruning has no detrimental effect on this measure of interpretability until so few parameters remain that accuracy beings to drop. Resnet-50 models trained on ImageNet maintain the same number of interpretable concepts and units until more than 90% of parameters have been pruned.


  Click for Model/Code and Paper
Interpreting Deep Visual Representations via Network Dissection

Jun 26, 2018
Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations. The proposed method quantifies the interpretability of CNN representations by evaluating the alignment between individual hidden units and a set of visual semantic concepts. By identifying the best alignments, units are given human interpretable labels across a range of objects, parts, scenes, textures, materials, and colors. The method reveals that deep representations are more transparent and interpretable than expected: we find that representations are significantly more interpretable than they would be under a random equivalently powerful basis. We apply the method to interpret and compare the latent representations of various network architectures trained to solve different supervised and self-supervised training tasks. We then examine factors affecting the network interpretability such as the number of the training iterations, regularizations, different initializations, and the network depth and width. Finally we show that the interpreted units can be used to provide explicit explanations of a prediction given by a CNN for an image. Our results highlight that interpretability is an important property of deep neural networks that provides new insights into their hierarchical structure.

* *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures 

  Click for Model/Code and Paper
Revisiting the Importance of Individual Units in CNNs via Ablation

Jun 07, 2018
Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba

We revisit the importance of the individual units in Convolutional Neural Networks (CNNs) for visual recognition. By conducting unit ablation experiments on CNNs trained on large scale image datasets, we demonstrate that, though ablating any individual unit does not hurt overall classification accuracy, it does lead to significant damage on the accuracy of specific classes. This result shows that an individual unit is specialized to encode information relevant to a subset of classes. We compute the correlation between the accuracy drop under unit ablation and various attributes of an individual unit such as class selectivity and weight L1 norm. We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}. However, our results show that class selectivity along with other attributes are good predictors of the importance of one unit to individual classes. We evaluate the impact of random rotation, batch normalization, and dropout to the importance of units to specific classes. Our results show that units with high selectivity play an important role in network classification power at the individual class level. Understanding and interpreting the behavior of these units is necessary and meaningful.


  Click for Model/Code and Paper
Network Dissection: Quantifying Interpretability of Deep Visual Representations

Apr 19, 2017
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.

* First two authors contributed equally. Oral presentation at CVPR 2017 

  Click for Model/Code and Paper
Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning

Jun 04, 2018
Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, Lalana Kagal

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

* Edited author email 

  Click for Model/Code and Paper
Seeing What a GAN Cannot Generate

Oct 24, 2019
David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba

Despite the success of Generative Adversarial Networks (GANs), mode collapse remains a serious issue during GAN training. To date, little work has focused on understanding and quantifying which modes have been dropped by a model. In this work, we visualize mode collapse at both the distribution level and the instance level. First, we deploy a semantic segmentation network to compare the distribution of segmented objects in the generated images with the target distribution in the training set. Differences in statistics reveal object classes that are omitted by a GAN. Second, given the identified omitted object classes, we visualize the GAN's omissions directly. In particular, we compare specific differences between individual photos and their approximate inversions by a GAN. To this end, we relax the problem of inversion and solve the tractable problem of inverting a GAN layer instead of the entire generator. Finally, we use this framework to analyze several recent GANs trained on multiple datasets and identify their typical failure cases.

* ICCV 2019 oral; http://ganseeing.csail.mit.edu/ 

  Click for Model/Code and Paper
Visualizing and Understanding Generative Adversarial Networks (Extended Abstract)

Jan 29, 2019
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

Generative Adversarial Networks (GANs) have achieved impressive results for many real-world applications. As an active research topic, many GAN variants have emerged with improvements in sample quality and training stability. However, visualization and understanding of GANs is largely missing. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to concepts with a segmentation-based network dissection method. We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. Finally, we examine the contextual relationship between these units and their surrounding by inserting the discovered concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in the scene. We will open source our interactive tools to help researchers and practitioners better understand their models.

* In AAAI-19 workshop on Network Interpretability for Deep Learning arXiv admin note: substantial text overlap with arXiv:1811.10597 

  Click for Model/Code and Paper
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Dec 08, 2018
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba

Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this work, we present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level. We first identify a group of interpretable units that are closely related to object concepts using a segmentation-based network dissection method. Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output. We examine the contextual relationship between these units and their surroundings by inserting the discovered object concepts into new images. We show several practical applications enabled by our framework, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a scene. We provide open source interpretation tools to help researchers and practitioners better understand their GAN models.

* 18 pages, 19 figures 

  Click for Model/Code and Paper
Does Baum-Welch Re-estimation Help Taggers?

Oct 24, 1994
David Elworthy

In part of speech tagging by Hidden Markov Model, a statistical model is used to assign grammatical categories to words in a text. Early work in the field relied on a corpus which had been tagged by a human annotator to train the model. More recently, Cutting {\it et al.} (1992) suggest that training can be achieved with a minimal lexicon and a limited amount of {\em a priori} information about probabilities, by using Baum-Welch re-estimation to automatically refine the model. In this paper, I report two experiments designed to determine how much manual training information is needed. The first experiment suggests that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy. The second experiment reveals that there are three distinct patterns of Baum-Welch re-estimation. In two of the patterns, the re-estimation ultimately reduces the accuracy of the tagging rather than improving it. The pattern which is applicable can be predicted from the quality of the initial model and the similarity between the tagged training corpus (if any) and the corpus to be tagged. Heuristics for deciding how to use re-estimation in an effective manner are given. The conclusions are broadly in agreement with those of Merialdo (1994), but give greater detail about the contributions of different parts of the model.

* Uses aclap.sty. Appeared in ANLP 94 

  Click for Model/Code and Paper
Open Ended Intelligence: The individuation of Intelligent Agents

Jun 12, 2015
David Weinbaum, Viktoras Veitas

Artificial General Intelligence is a field of research aiming to distill the principles of intelligence that operate independently of a specific problem domain or a predefined context and utilize these principles in order to synthesize systems capable of performing any intellectual task a human being is capable of and eventually go beyond that. While "narrow" artificial intelligence which focuses on solving specific problems such as speech recognition, text comprehension, visual pattern recognition, robotic motion, etc. has shown quite a few impressive breakthroughs lately, understanding general intelligence remains elusive. In the paper we offer a novel theoretical approach to understanding general intelligence. We start with a brief introduction of the current conceptual approach. Our critique exposes a number of serious limitations that are traced back to the ontological roots of the concept of intelligence. We then propose a paradigm shift from intelligence perceived as a competence of individual agents defined in relation to an a priori given problem domain or a goal, to intelligence perceived as a formative process of self-organization by which intelligent agents are individuated. We call this process open-ended intelligence. Open-ended intelligence is developed as an abstraction of the process of cognitive development so its application can be extended to general agents and systems. We introduce and discuss three facets of the idea: the philosophical concept of individuation, sense-making and the individuation of general cognitive agents. We further show how open-ended intelligence can be framed in terms of a distributed, self-organizing network of interacting elements and how such process is scalable. The framework highlights an important relation between coordination and intelligence and a new understanding of values. We conclude with a number of questions for future research.

* Preprint; 35 pages, 2 figures; Keywords: intelligence, cognition, individuation, assemblage, self-organization, sense-making, coordination, enaction; en-US proofreading 

  Click for Model/Code and Paper
Cognitive Development of the Web

May 16, 2015
Viktoras Veitas, David Weinbaum

The sociotechnological system is a system constituted of human individuals and their artifacts: technological artifacts, institutions, conceptual and representational systems, worldviews, knowledge systems, culture and the whole biosphere as a volutionary niche. In our view the sociotechnological system as a super-organism is shaped and determined both by the characteristics of the agents involved and the characteristics emergent in their interactions at multiple scales. Our approach to sociotechnological dynamics will maintain a balance between perspectives: the individual and the collective. Accordingly, we analyze dynamics of the Web as a sociotechnological system made of people, computers and digital artifacts (Web pages, databases, search engines, etc.). Making sense of the sociotechnological system while being part of it, is also a constant interplay between pragmatic and value based approaches. The first is focusing on the actualities of the system while the second highlights the observer's projections. In our attempt to model sociotechnological dynamics and envision its future, we take special care to make explicit our values as part of the analysis. In sociotechnological systems with a high degree of reflexivity (coupling between the perception of the system and the system's behavior), highlighting values is of critical importance. In this essay, we choose to see the future evolution of the web as facilitating a basic value, that is, continuous open-ended intelligence expansion. By that we mean that we see intelligence expansion as the determinant of the 'greater good' and 'well being' of both of individuals and collectives at all scales. Our working definition of intelligence here is the progressive process of sense-making of self, other, environment and universe. Intelligence expansion, therefore, means an increasing ability of sense-making.

* Working paper, 22 pages, 2 figures 

  Click for Model/Code and Paper
Asynchronous Stochastic Approximation with Differential Inclusions

Dec 10, 2011
Steven Perkins, David S. Leslie

The asymptotic pseudo-trajectory approach to stochastic approximation of Benaim, Hofbauer and Sorin is extended for asynchronous stochastic approximations with a set-valued mean field. The asynchronicity of the process is incorporated into the mean field to produce convergence results which remain similar to those of an equivalent synchronous process. In addition, this allows many of the restrictive assumptions previously associated with asynchronous stochastic approximation to be removed. The framework is extended for a coupled asynchronous stochastic approximation process with set-valued mean fields. Two-timescales arguments are used here in a similar manner to the original work in this area by Borkar. The applicability of this approach is demonstrated through learning in a Markov decision process.

* 41 pages 

  Click for Model/Code and Paper
Implementation and Comparison of Solution Methods for Decision Processes with Non-Markovian Rewards

Oct 19, 2012
Charles Gretton, David Price, Sylvie Thiebaux

This paper examines a number of solution methods for decision processes with non-Markovian rewards (NMRDPs). They all exploit a temporal logic specification of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to well-known MDP solution methods. They differ however in the representation of the target MDP and the class of MDP solution methods to which they are suited. As a result, they adopt different temporal logics and different translations. Unfortunately, no implementation of these methods nor experimental let alone comparative results have ever been reported. This paper is the first step towards filling this gap. We describe an integrated system for solving NMRDPs which implements these methods and several variants under a common interface; we use it to compare the various approaches and identify the problem features favoring one over the other.

* Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003) 

  Click for Model/Code and Paper
Design of conversational humanoid robot based on hardware independent gesture generation

May 21, 2019
katsushi ikeuchi, David Baumert, Shunsuke Kudoh, Masaru Takizawa

With an increasing need for elderly and disability care, there is an increasing opportunity for intelligent and mobile devices such as robots to provide care and support solutions. In order to naturally assist and interact with humans, a robot must possess effective conversational capabilities. Gestures accompanying spoken sentences are an important factor in human-to-human conversational communication. Humanoid robots must also use gestures if they are to be capable of the rich interactions implied and afforded by their humanlike appearance. However, present systems for gesture generation do not dynamically provide realistic physical gestures that are naturally understood by humans. A method for humanoid robots to generate gestures along with spoken sentences is proposed herein. We emphasize that our gesture-generating architecture can be applied to any type of humanoid robot through the use of labanotation, which is an existing system for notating human dance movements. Labanotation's gesture symbols can computationally transformed to be compatible across a range of robots with doddering physical characteristics. This paper describes a solution as an integrated system for conversational robots whose speech and gestures can supplement each other in human-robot interaction.

* 7 pages, 8 figures 

  Click for Model/Code and Paper
The Infinite Latent Events Model

May 09, 2012
David Wingate, Noah Goodman, Daniel Roy, Joshua Tenenbaum

We present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisy-OR-like transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009) 

  Click for Model/Code and Paper
New Confidence Measures for Statistical Machine Translation

Feb 06, 2009
Sylvain Raybaud, Caroline Lavecchia, David Langlois, Kamel Smaïli

A confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for machine translation, based on mutual information, n-gram language model and lexical features language model. We evaluate how well they perform individually or together, and show that using a combination of confidence measures based on mutual information yields a classification error rate as low as 25.1% with an F-measure of 0.708.

* International Conference On Agents and Artificial Intelligence - ICAART 09 (2009) 

  Click for Model/Code and Paper
Refactoring Neural Networks for Verification

Aug 06, 2019
David Shriver, Dong Xu, Sebastian Elbaum, Matthew B. Dwyer

Deep neural networks (DNN) are growing in capability and applicability. Their effectiveness has led to their use in safety critical and autonomous systems, yet there is a dearth of cost-effective methods available for reasoning about the behavior of a DNN. In this paper, we seek to expand the applicability and scalability of existing DNN verification techniques through DNN refactoring. A DNN refactoring defines (a) the transformation of the DNN's architecture, i.e., the number and size of its layers, and (b) the distillation of the learned relationships between the input features and function outputs of the original to train the transformed network. Unlike with traditional code refactoring, DNN refactoring does not guarantee functional equivalence of the two networks, but rather it aims to preserve the accuracy of the original network while producing a simpler network that is amenable to more efficient property verification. We present an automated framework for DNN refactoring, and demonstrate its potential effectiveness through three case studies on networks used in autonomous systems.


  Click for Model/Code and Paper
Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks

Jul 25, 2018
David Zheng, Vinson Luo, Jiajun Wu, Joshua B. Tenenbaum

We propose a framework for the completely unsupervised learning of latent object properties from their interactions: the perception-prediction network (PPN). Consisting of a perception module that extracts representations of latent object properties and a prediction module that uses those extracted properties to simulate system dynamics, the PPN can be trained in an end-to-end fashion purely from samples of object dynamics. The representations of latent object properties learned by PPNs not only are sufficient to accurately simulate the dynamics of systems comprised of previously unseen objects, but also can be translated directly into human-interpretable properties (e.g., mass, coefficient of restitution) in an entirely unsupervised manner. Crucially, PPNs also generalize to novel scenarios: their gradient-based training can be applied to many dynamical systems and their graph-based structure functions over systems comprised of different numbers of objects. Our results demonstrate the efficacy of graph-based neural architectures in object-centric inference and prediction tasks, and our model has the potential to discover relevant object properties in systems that are not yet well understood.

* UAI 2018 (oral) 

  Click for Model/Code and Paper
Server assisted distributed cooperative localization over unreliable communication links

Dec 24, 2017
Solmaz S. Kia, Jonathan Hechtbauer, David Gogokhiya, Sonia Martinez

This paper considers the problem of cooperative localization (CL) using inter-robot measurements for a group of networked robots with limited on-board resources. We propose a novel recursive algorithm in which each robot localizes itself in a global coordinate frame by local dead reckoning, and opportunistically corrects its pose estimate whenever it receives a relative measurement update message from a server. The computation and storage cost per robot in terms of the size of the team is of order O(1), and the robots are only required to transmit information when they are involved in a relative measurement. The server also only needs to compute and transmit update messages when it receives an inter-robot measurement. We show that under perfect communication, our algorithm is an alternative but exact implementation of a joint CL for the entire team via Extended Kalman Filter (EKF). The perfect communication however is not a hard requirement. In fact, we show that our algorithm is intrinsically robust with respect to communication failures, with formal guarantees that the updated estimates of the robots receiving the update message are of minimum variance in a first-order approximate sense at that given timestep. We demonstrate the performance of the algorithm in simulation and experiments.

* The title has changes from "A partially decentralized EKF scheme for cooperative localization over unreliable communication links" to "Server assisted distributed cooperative localization over unreliable communication links". The presentation of the paper is revised. New example is added. Experimental results are added 

  Click for Model/Code and Paper
Federated Learning for Keyword Spotting

Oct 31, 2018
David Leroy, Alice Coucke, Thibaut Lavril, Thibault Gisselbrecht, Joseph Dureau

We propose a practical approach based on federated learning to solve out-of-domain issues with continuously running embedded speech-based models such as wake word detectors. We conduct an extensive empirical study of the federated averaging algorithm for the "Hey Snips" wake word based on a crowdsourced dataset that mimics a federation of wake word users. We empirically demonstrate that using an adaptive averaging strategy inspired from Adam in place of standard weighted model averaging highly reduces the number of communication rounds required to reach our target performance. The associated upstream communication costs per user are estimated at 8 MB, which is a reasonable in the context of smart home voice assistants. Additionally, the dataset used for these experiments is being open sourced with the aim of fostering further transparent research in the application of federated learning to speech data.


  Click for Model/Code and Paper