Models, code, and papers for "Edward Li":

Nonsmooth Frank-Wolfe using Uniform Affine Approximations

Mar 20, 2018
Edward Cheung, Yuying Li

Frank-Wolfe methods (FW) have gained significant interest in the machine learning community due to its ability to efficiently solve large problems that admit a sparse structure (e.g. sparse vectors and low-rank matrices). However the performance of the existing FW method hinges on the quality of the linear approximation. This typically restricts FW to smooth functions for which the approximation quality, indicated by a global curvature measure, is reasonably good. In this paper, we propose a modified FW algorithm amenable to nonsmooth functions by optimizing for approximation quality over all affine approximations given a neighborhood of interest. We analyze theoretical properties of the proposed algorithm and demonstrate that it overcomes many issues associated with existing methods in the context of nonsmooth low-rank matrix estimation.


  Click for Model/Code and Paper
Projection Free Rank-Drop Steps

Jul 04, 2017
Edward Cheung, Yuying Li

The Frank-Wolfe (FW) algorithm has been widely used in solving nuclear norm constrained problems, since it does not require projections. However, FW often yields high rank intermediate iterates, which can be very expensive in time and space costs for large problems. To address this issue, we propose a rank-drop method for nuclear norm constrained problems. The goal is to generate descent steps that lead to rank decreases, maintaining low-rank solutions throughout the algorithm. Moreover, the optimization problems are constrained to ensure that the rank-drop step is also feasible and can be readily incorporated into a projection-free minimization method, e.g., Frank-Wolfe. We demonstrate that by incorporating rank-drop steps into the Frank-Wolfe algorithm, the rank of the solution is greatly reduced compared to the original Frank-Wolfe or its common variants.


  Click for Model/Code and Paper
Slip Detection with Combined Tactile and Visual Information

Feb 27, 2018
Jianhua Li, Siyuan Dong, Edward Adelson

Slip detection plays a vital role in robotic manipulation and it has long been a challenging problem in the robotic community. In this paper, we propose a new method based on deep neural network (DNN) to detect slip. The training data is acquired by a GelSight tactile sensor and a camera mounted on a gripper when we use a robot arm to grasp and lift 94 daily objects with different grasping forces and grasping positions. The DNN is trained to classify whether a slip occurred or not. To evaluate the performance of the DNN, we test 10 unseen objects in 152 grasps. A detection accuracy as high as 88.03% is achieved. It is anticipated that the accuracy can be further improved with a larger dataset. This method is beneficial for robots to make stable grasps, which can be widely applied to automatic force control, grasping strategy selection and fine manipulation.

* International Conference on Robotics and Automation (ICRA) 2018 

  Click for Model/Code and Paper
Identifying Domain Adjacent Instances for Semantic Parsers

Aug 26, 2018
James Ferguson, Janara Christensen, Edward Li, Edgar Gonzàlez

When the semantics of a sentence are not representable in a semantic parser's output schema, parsing will inevitably fail. Detection of these instances is commonly treated as an out-of-domain classification problem. However, there is also a more subtle scenario in which the test data is drawn from the same domain. In addition to formalizing this problem of domain-adjacency, we present a comparison of various baselines that could be used to solve it. We also propose a new simple sentence representation that emphasizes words which are unexpected. This approach improves the performance of a downstream semantic parser run on in-domain and domain-adjacent instances.

* EMNLP 2018 Camera Ready 

  Click for Model/Code and Paper
Sparse Reconstruction of Compressive Sensing MRI using Cross-Domain Stochastically Fully Connected Conditional Random Fields

Dec 25, 2015
Edward Li, Farzad Khalvati, Mohammad Javad Shafiee, Masoom A. Haider, Alexander Wong

Magnetic Resonance Imaging (MRI) is a crucial medical imaging technology for the screening and diagnosis of frequently occurring cancers. However image quality may suffer by long acquisition times for MRIs due to patient motion, as well as result in great patient discomfort. Reducing MRI acquisition time can reduce patient discomfort and as a result reduces motion artifacts from the acquisition process. Compressive sensing strategies, when applied to MRI, have been demonstrated to be effective at decreasing acquisition times significantly by sparsely sampling the \emph{k}-space during the acquisition process. However, such a strategy requires advanced reconstruction algorithms to produce high quality and reliable images from compressive sensing MRI. This paper proposes a new reconstruction approach based on cross-domain stochastically fully connected conditional random fields (CD-SFCRF) for compressive sensing MRI. The CD-SFCRF introduces constraints in both \emph{k}-space and spatial domains within a stochastically fully connected graphical model to produce improved MRI reconstruction. Experimental results using T2-weighted (T2w) imaging and diffusion-weighted imaging (DWI) of the prostate show strong performance in preserving fine details and tissue structures in the reconstructed images when compared to other tested methods even at low sampling rates.

* 9 pages 

  Click for Model/Code and Paper
GelSlim: A High-Resolution, Compact, Robust, and Calibrated Tactile-sensing Finger

May 15, 2018
Elliott Donlon, Siyuan Dong, Melody Liu, Jianhua Li, Edward Adelson, Alberto Rodriguez

This work describes the development of a high-resolution tactile-sensing finger for robot grasping. This finger, inspired by previous GelSight sensing techniques, features an integration that is slimmer, more robust, and with more homogeneous output than previous vision-based tactile sensors. To achieve a compact integration, we redesign the optical path from illumination source to camera by combining light guides and an arrangement of mirror reflections. We parameterize the optical path with geometric design variables and describe the tradeoffs between the finger thickness, the depth of field of the camera, and the size of the tactile sensing area. The sensor sustains the wear from continuous use -- and abuse -- in grasping tasks by combining tougher materials for the compliant soft gel, a textured fabric skin, a structurally rigid body, and a calibration process that maintains homogeneous illumination and contrast of the tactile images during use. Finally, we evaluate the sensor's durability along four metrics that track the signal quality during more than 3000 grasping experiments.

* RA-L Pre-print. 8 pages 

  Click for Model/Code and Paper
Compositional Imitation Learning: Explaining and executing one task at a time

Dec 04, 2018
Thomas Kipf, Yujia Li, Hanjun Dai, Vinicius Zambaldi, Edward Grefenstette, Pushmeet Kohli, Peter Battaglia

We introduce a framework for Compositional Imitation Learning and Execution (CompILE) of hierarchically-structured behavior. CompILE learns reusable, variable-length segments of behavior from demonstration data using a novel unsupervised, fully-differentiable sequence segmentation module. These learned behaviors can then be re-composed and executed to perform new tasks. At training time, CompILE auto-encodes observed behavior into a sequence of latent codes, each corresponding to a variable-length segment in the input sequence. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate our model in a challenging 2D multi-task environment and show that CompILE can find correct task boundaries and event encodings in an unsupervised manner without requiring annotated demonstration data. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our agent could learn given only sparse rewards, where agents without task-specific policies struggle.

* Presented at the Learning by Instruction (LBI) Workshop at NeurIPS 2018 

  Click for Model/Code and Paper
Privacy-Preserving Action Recognition for Smart Hospitals using Low-Resolution Depth Images

Nov 25, 2018
Edward Chou, Matthew Tan, Cherry Zou, Michelle Guo, Albert Haque, Arnold Milstein, Li Fei-Fei

Computer-vision hospital systems can greatly assist healthcare workers and improve medical facility treatment, but often face patient resistance due to the perceived intrusiveness and violation of privacy associated with visual surveillance. We downsample video frames to extremely low resolutions to degrade private information from surveillance videos. We measure the amount of activity-recognition information retained in low resolution depth images, and also apply a privately-trained DCSCN super-resolution model to enhance the utility of our images. We implement our techniques with two actual healthcare-surveillance scenarios, hand-hygiene compliance and ICU activity-logging, and show that our privacy-preserving techniques preserve enough information for realistic healthcare tasks.

* Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216 

  Click for Model/Code and Paper
Graph Convolutional Transformer: Learning the Graphical Structure of Electronic Health Records

Jun 28, 2019
Edward Choi, Zhen Xu, Yujia Li, Michael W. Dusenberry, Gerardo Flores, Yuan Xue, Andrew M. Dai

Effective modeling of electronic health records (EHR) is rapidly becoming an important topic in both academia and industry. A recent study showed that utilizing the graphical structure underlying EHR data (e.g. relationship between diagnoses and treatments) improves the performance of prediction tasks such as heart failure diagnosis prediction. However, EHR data do not always contain complete structure information. Moreover, when it comes to claims data, structure information is completely unavailable to begin with. Under such circumstances, can we still do better than just treating EHR data as a flat-structured bag-of-features? In this paper, we study the possibility of utilizing the implicit structure of EHR by using the Transformer for prediction tasks on EHR data. Specifically, we argue that the Transformer is a suitable model to learn the hidden EHR structure, and propose the Graph Convolutional Transformer, which uses data statistics to guide the structure learning process. Our model empirically demonstrated superior prediction performance to previous approaches on both synthetic data and publicly available EHR data on encounter-based prediction tasks such as graph reconstruction and readmission prediction, indicating that it can serve as an effective general-purpose representation learning algorithm for EHR data.


  Click for Model/Code and Paper
Progressively Growing Generative Adversarial Networks for High Resolution Semantic Segmentation of Satellite Images

Feb 12, 2019
Edward Collier, Kate Duffy, Sangram Ganguly, Geri Madanguit, Subodh Kalia, Gayaka Shreekant, Ramakrishna Nemani, Andrew Michaelis, Shuang Li, Auroop Ganguly, Supratik Mukhopadhyay

Machine learning has proven to be useful in classification and segmentation of images. In this paper, we evaluate a training methodology for pixel-wise segmentation on high resolution satellite images using progressive growing of generative adversarial networks. We apply our model to segmenting building rooftops and compare these results to conventional methods for rooftop segmentation. We present our findings using the SpaceNet version 2 dataset. Progressive GAN training achieved a test accuracy of 93% compared to 89% for traditional GAN training.

* Accepted too and presented at DMESS 2018 as part of IEEE ICDM 2018 

  Click for Model/Code and Paper
Comparing Rule-Based and Deep Learning Models for Patient Phenotyping

Mar 25, 2017
Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, Patrick D. Tyler, Leo Anthony Celi

Objective: We investigate whether deep learning techniques for natural language processing (NLP) can be used efficiently for patient phenotyping. Patient phenotyping is a classification task for determining whether a patient has a medical condition, and is a crucial part of secondary analysis of healthcare data. We assess the performance of deep learning algorithms and compare them with classical NLP approaches. Materials and Methods: We compare convolutional neural networks (CNNs), n-gram models, and approaches based on cTAKES that extract pre-defined medical concepts from clinical notes and use them to predict patient phenotypes. The performance is tested on 10 different phenotyping tasks using 1,610 discharge summaries extracted from the MIMIC-III database. Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our model having an F1-score up to 37 points higher than alternative approaches. We additionally assess the interpretability of our model by presenting a method that extracts the most salient phrases for a particular prediction. Conclusion: We show that NLP methods based on deep learning improve the performance of patient phenotyping. Our CNN-based algorithm automatically learns the phrases associated with each patient phenotype. As such, it reduces the annotation complexity for clinical domain experts, who are normally required to develop task-specific annotation rules and identify relevant phrases. Our method performs well in terms of both performance and interpretability, which indicates that deep learning is an effective approach to patient phenotyping based on clinicians' notes.


  Click for Model/Code and Paper
Relational Deep Reinforcement Learning

Jun 28, 2018
Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.


  Click for Model/Code and Paper
Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Nov 28, 2017
Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.


  Click for Model/Code and Paper
A Step Toward Quantifying Independently Reproducible Machine Learning Research

Sep 14, 2019
Edward Raff

What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.

* to appear in Proc. Neural Information Processing Systems (NeurIPS), 2019 

  Click for Model/Code and Paper
Practical Calculation of Gittins Indices for Multi-armed Bandits

Sep 11, 2019
James Edwards

Gittins indices provide an optimal solution to the classical multi-armed bandit problem. An obstacle to their use has been the common perception that their computation is very difficult. This paper demonstrates an accessible general methodology for the calculating Gittins indices for the multi-armed bandit with a detailed study on the cases of Bernoulli and Gaussian rewards. With accompanying easy-to-use open source software, this work removes computation as a barrier to using Gittins indices in these commonly found settings.


  Click for Model/Code and Paper
Growing and Retaining AI Talent for the United States Government

Sep 27, 2018
Edward Raff

Artificial Intelligence and Machine Learning have become transformative to a number of industries, and as such many industries need for AI talent is increasing the demand for individuals with these skills. This continues to exacerbate the difficulty of acquiring and retaining talent for the United States Federal Government, both for its direct employees as well as the companies that support it. We take the position that by focusing on growing and retaining current talent through a number of cultural changes, the government can work to remediate this problem today.

* Presented at AAAI FSS-18: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA 

  Click for Model/Code and Paper
YouTube-8M Video Understanding Challenge Approach and Applications

Jun 26, 2017
Edward Chen

This paper introduces the YouTube-8M Video Understanding Challenge hosted as a Kaggle competition and also describes my approach to experimenting with various models. For each of my experiments, I provide the score result as well as possible improvements to be made. Towards the end of the paper, I discuss the various ensemble learning techniques that I applied on the dataset which significantly boosted my overall competition score. At last, I discuss the exciting future of video understanding research and also the many applications that such research could significantly improve.

* YouTube-8M Workshop submission, 8 pages 

  Click for Model/Code and Paper
Limits to rover miniaturisation and their implications for solar system exploration

Jun 16, 2017
Stephen Edwards

Semiautonomous rover scaling is examined for exploration throughout the solar system. Communications to a relay orbiter is a major constraint, due to power requirements and decreasing antenna gain at small sizes. Also, analysis time scales adversely for power hungry Raman or surface abrasion, and also for low photon count gamma ray spectroscopy. 1 cm scale rovers with a MER like capability should be possible, and could operate on solar power to 40 AU. At 2 cm, rovers powered by current RTGs are feasible, and are useful in the outer solar system. Such rovers will return images, autonomously navigate from interest point to interest point, and autonomously deploy instruments. Analyses include elemental APX and gamma ray spectroscopy for elemental and Raman and IR spectroscopy for molecular and or mineralogical compositions. They can be built with current technology, the component size is similar to watch movements. Low temperatures are not an obstacle. Very high temperatures, e.g. 390 to 480 C for surface Venus, will require more work on high temperature electronics, but current work suggests that dense ICs for Venusian conditions should be feasible in the near future. Mm size rovers, designed primarily for imaging, are feasible at inner solar system solar flux levels. If deployed in large numbers, manufacturing developments would be needed to allow mechanised assembly of multiple MEMS systems. Miniature, degassing to space, RTGs could provide power in cold vacuum environments, as could advanced betavoltaic systems, though much work would be required to develop the latter.


  Click for Model/Code and Paper
Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics

Nov 06, 2013
Edward Grefenstette

This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce representations for larger units of text by composing the representations of smaller units of text. This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research.

* DPhil Thesis, University of Oxford, Submitted and accepted in 2013 

  Click for Model/Code and Paper