Models, code, and papers for "Philip Q":

Detecting Cancer Metastases on Gigapixel Pathology Images

Mar 08, 2017
Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q. Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe

Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.

* Fig 1: normal and tumor patches were accidentally reversed - now fixed. Minor grammatical corrections in appendix, section "Image Color Normalization" 

  Access Model/Code and Paper
Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Sep 07, 2019
Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

* IEEE Transactions on Neural Networks and Learning Systems 

  Access Model/Code and Paper
Resource Allocation for a Wireless Coexistence Management System Based on Reinforcement Learning

May 24, 2018
Philip Soeffker, Dimitri Block, Nico Wiebusch, Uwe Meier

In industrial environments, an increasing amount of wireless devices are used, which utilize license-free bands. As a consequence of these mutual interferences of wireless systems might decrease the state of coexistence. Therefore, a central coexistence management system is needed, which allocates conflict-free resources to wireless systems. To ensure a conflict-free resource utilization, it is useful to predict the prospective medium utilization before resources are allocated. This paper presents a self-learning concept, which is based on reinforcement learning. A simulative evaluation of reinforcement learning agents based on neural networks, called deep Q-networks and double deep Q-networks, was realized for exemplary and practically relevant coexistence scenarios. The evaluation of the double deep Q-network showed that a prediction accuracy of at least 98 % can be reached in all investigated scenarios.

* Submitted to the 23rd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA 2018) 

  Access Model/Code and Paper
Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Mar 18, 2020
Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer, Shimon Whiteson

Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-world cooperative robotic manipulation and transportation tasks. Nevertheless, decentralised cooperative robotic control has received less attention from the deep reinforcement learning community, as compared to single-agent robotics and multi-agent games with discrete actions. To address this gap, this paper introduces Multi-Agent Mujoco, an easily extensible multi-agent benchmark suite for robotic control in continuous action spaces. The benchmark tasks are diverse and admit easily configurable partially observable settings. Inspired by the success of single-agent continuous value-based algorithms in robotic control, we also introduce COMIX, a novel extension to a common discrete action multi-agent $Q$-learning algorithm. We show that COMIX significantly outperforms state-of-the-art MADDPG on a partially observable variant of a popular particle environment and matches or surpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite and method, we can now pose an interesting question: what is the key to performance in such settings, the use of value-based methods instead of policy gradients, or the factorisation of the joint $Q$-function? To answer this question, we propose a second new method, FacMADDPG, which factors MADDPG's critic. Experimental results on Multi-Agent Mujoco suggest that factorisation is the key to performance.

  Access Model/Code and Paper
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

May 21, 2018
Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

* Camera-ready version, International Conference of Machine Learning 2017; updated to fix print-breaking image 

  Access Model/Code and Paper
A Road Map to Strong Intelligence

Feb 20, 2020
Philip Paquette

I wrote this paper because technology can really improve people's lives. With it, we can live longer in a healthy body, save time through increased efficiency and automation, and make better decisions. To get to the next level, we need to start looking at intelligence from a much broader perspective, and promote international interdisciplinary collaborations. Section 1 of this paper delves into sociology and social psychology to explain that the mechanisms underlying intelligence are inherently social. Section 2 proposes a method to classify intelligence, and describes the differences between weak and strong intelligence. Section 3 examines the Chinese Room argument from a different perspective. It demonstrates that a Turing-complete machine cannot have strong intelligence, and considers the modifications necessary for a computer to be intelligent and have understanding. Section 4 argues that the existential risk caused by the technological explosion of a single agent should not be of serious concern. Section 5 looks at the AI control problem and argues that it is impossible to build a super-intelligent machine that will do what it creators want. By using insights from biology, it also proposes a solution to the control problem. Section 6 discusses some of the implications of strong intelligence. Section 7 lists the main challenges with deep learning, and asserts that radical changes will be required to reach strong intelligence. Section 8 examines a neuroscience framework that could help explain how a cortical column works. Section 9 lays out the broad strokes of a road map towards strong intelligence. Finally, section 10 analyzes the impacts and the challenges of greater intelligence.

  Access Model/Code and Paper
Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing

Sep 22, 2019
Philip May

Image augmentation is a widely used technique to improve the performance of convolutional neural networks (CNNs). In common image shifting, cropping, flipping, shearing and rotating are used for augmentation. But there are more advanced techniques like Cutout and SamplePairing. In this work we present two improvements of the state-of-the-art Cutout and SamplePairing techniques. Our new method called Copyout takes a square patch of another random training image and copies it onto a random location of each image used for training. The second technique we discovered is called CopyPairing. It combines Copyout and SamplePairing for further augmentation and even better performance. We apply different experiments with these augmentation techniques on the CIFAR-10 dataset to evaluate and compare them under different configurations. In our experiments we show that Copyout reduces the test error rate by 8.18% compared with Cutout and 4.27% compared with SamplePairing. CopyPairing reduces the test error rate by 11.97% compared with Cutout and 8.21% compared with SamplePairing. Copyout and CopyPairing implementations are available at

* 8 pages, 5 figures 

  Access Model/Code and Paper
An Architecture for Deep, Hierarchical Generative Models

Dec 08, 2016
Philip Bachman

We present an architecture which lets us train deep, directed generative models with many layers of latent variables. We include deterministic paths between all latent variables and the generated output, and provide a richer set of connections between computations for inference and generation, which enables more effective communication of information throughout the model during training. To improve performance on natural images, we incorporate a lightweight autoregressive model in the reconstruction distribution. These techniques permit end-to-end training of models with 10+ layers of latent variables. Experiments show that our approach achieves state-of-the-art performance on standard image modelling benchmarks, can expose latent class structure in the absence of label information, and can provide convincing imputations of occluded regions in natural images.

* Published in NIPS 2016 

  Access Model/Code and Paper
Translating near-synonyms: Possibilities and preferences in the interlingua

Nov 02, 1998
Philip Edmonds

This paper argues that an interlingual representation must explicitly represent some parts of the meaning of a situation as possibilities (or preferences), not as necessary or definite components of meaning (or constraints). Possibilities enable the analysis and generation of nuance, something required for faithful translation. Furthermore, the representation of the meaning of words, especially of near-synonyms, is crucial, because it specifies which nuances words can convey in which contexts.

* Proceedings of the AMTA/SIG-IL Second Workshop on Interlinguas, October 1998 
* 8 pages, LaTeX2e, 1 eps figure, uses colacl.sty, epsfig.sty, avm.sty, times.sty 

  Access Model/Code and Paper
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

Aug 07, 1998
Philip Resnik

Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.

* Proceedings of AMTA-98 
* LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty. An Appendix at contains test data 

  Access Model/Code and Paper
Disambiguating Noun Groupings with Respect to WordNet Senses

Nov 29, 1995
Philip Resnik

Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one finds in on-line thesauri, or as the output of distributional clustering algorithms. Disambiguation is performed with respect to WordNet senses, which are fairly fine-grained; however, the method also permits the assignment of higher-level WordNet categories rather than sense labels. The method is illustrated primarily by example, though results of a more rigorous evaluation are also presented.

* Proceedings of the 3rd Workshop on Very Large Corpora, MIT, 30 June 1995 
* LaTeX, 16 pages, uses breakcites.sty, authdate.sty 

  Access Model/Code and Paper
Dr. Tux: A Question Answering System for Ubuntu users

Aug 25, 2018
Bijil Abraham Philip, Manas Jog, Apurv Milind Upasani

Various forums and question answering (Q&A) sites are available online that allow Ubuntu users to find results similar to their queries. However, searching for a result is often time consuming as it requires the user to find a specific problem instance relevant to his/her query from a large set of questions. In this paper, we present an automated question answering system for Ubuntu users called Dr. Tux that is designed to answer user's queries by selecting the most similar question from an online database. The prototype was implemented in Python and uses NLTK and CoreNLP tools for Natural Language Processing. The data for the prototype was taken from the AskUbuntu website which contains about 150k questions. The results obtained from the manual evaluation of the prototype were promising while also presenting some interesting opportunities for improvement.

  Access Model/Code and Paper
Region adaptive graph fourier transform for 3d point clouds

Mar 04, 2020
Eduardo Pavez, Benjamin Girault, Antonio Ortega, Philip A. Chou

We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. We assume the points are organized by a family of nested partitions represented by a tree. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. At each resolution level, attributes are processed in clusters by a set of block transforms. Each block transform produces a single approximation (DC) coefficient, and various detail (AC) coefficients. The DC coefficients are promoted up the tree to the next (lower resolution) level, where the process can be repeated until reaching the root. Since clusters may have a different numbers of points, each block transform must incorporate the relative importance of each coefficient. For this, we introduce the $\mathbf{Q}$-normalized graph Laplacian, and propose using its eigenvectors as the block transform. The RA-GFT outperforms the Region Adaptive Haar Transform (RAHT) by up to 2.5 dB, with a small complexity overhead.

  Access Model/Code and Paper
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Mar 27, 2020
Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($\lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

* AISTATS 2020 

  Access Model/Code and Paper
Learn to Interpret Atari Agents

Dec 29, 2018
Zhao Yang, Song Bai, Li Zhang, Philip H. S. Torr

Deep Reinforcement Learning (DeepRL) models surpass human-level performance in a multitude of tasks. Standing in stark contrast to the stellar performance is the obscure nature of the learned policies. The direct mapping from states to actions makes it hard to interpret the rationale behind the decision making of agents. In contrast to previous a-posteriori methods of visualising DeepRL policies, we propose an end-to-end trainable framework based on Rainbow, a representative Deep Q-Network (DQN) agent. Our method automatically detects important regions in the input domain, which enables characterization of general strategy and explanation for non-intuitive behaviors. Hence, we call it Region Sensitive Rainbow (RS-Rainbow). RS-Rainbow utilises a simple yet effective mechanism to incorporate innate visualisation ability into the learning model, not only improving the interpretability, but enabling the agent to leverage enhanced state representations for improved performance. Without extra supervision, specialised feature detectors focusing on distinct aspects of gameplay can be learned. Extensive experiments on the challenging platform of Atari 2600 demonstrates the superiority of RS-Rainbow. In particular, our agent achieves state of the art at just 25% of the training frames without massive large-scale parallel training.

  Access Model/Code and Paper
Increasing the Action Gap: New Operators for Reinforcement Learning

Dec 15, 2015
Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

* Bellemare, Marc G., Ostrovski, G., Guez, A., Thomas, Philip S., and Munos, Remi. Increasing the Action Gap: New Operators for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2016 

  Access Model/Code and Paper
Playing Doom with SLAM-Augmented Deep Reinforcement Learning

Dec 01, 2016
Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, Philip H. S. Torr

A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent's localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction.We evaluate the efficacy of our approach in Doom, a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.

  Access Model/Code and Paper
Back to the Future for Dialogue Research: A Position Paper

Dec 04, 2018
Philip R Cohen

This short position paper is intended to provide a critique of current approaches to dialogue, as well as a roadmap for collaborative dialogue research. It is unapologetically opinionated, but informed by 40 years of dialogue re-search. No attempt is made to be comprehensive. The paper will discuss current research into building so-called "chatbots", slot-filling dialogue systems, and plan-based dialogue systems. For further discussion of some of these issues, please see (Allen et al., in press).

* AAAI Workshop 2019, Deep Dial 

  Access Model/Code and Paper