Models, code, and papers for "Caroline Chan":

Everybody Dance Now

Aug 22, 2018
Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros

This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We pose this problem as a per-frame image-to-image translation with spatio-temporal smoothing. Using pose detections as an intermediate representation between source and target, we learn a mapping from pose images to a target subject's appearance. We adapt this setup for temporally coherent video generation including realistic face synthesis. Our video demo can be found at .

  Click for Model/Code and Paper
Learning Individual Styles of Conversational Gesture

Jun 10, 2019
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik

Human speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures. The project website with video, code and data can be found at .

* CVPR 2019 

  Click for Model/Code and Paper
Impact of Inference Accelerators on hardware selection

Oct 07, 2019
Dibyajyoti Pati, Caroline Favart, Purujit Bahl, Vivek Soni, Yun-chan Tsai, Michael Potter, Jiahui Guan, Xiaomeng Dong, V. Ratna Saripalli

As opportunities for AI-assisted healthcare grow steadily, model deployment faces challenges due to the specific characteristics of the industry. The configuration choice for a production device can impact model performance while influencing operational costs. Moreover, in healthcare some situations might require fast, but not real time, inference. We study different configurations and conduct a cost-performance analysis to determine the optimized hardware for the deployment of a model subject to healthcare domain constraints. We observe that a naive performance comparison may not lead to an optimal configuration selection. In fact, given realistic domain constraints, CPU execution might be preferable to GPU accelerators. Hence, defining beforehand precise expectations for model deployment is crucial.

  Click for Model/Code and Paper
FastEstimator: A Deep Learning Library for Fast Prototyping and Productization

Oct 07, 2019
Xiaomeng Dong, Junpyo Hong, Hsi-Ming Chang, Michael Potter, Aritra Chowdhury, Purujit Bahl, Vivek Soni, Yun-Chan Tsai, Rajesh Tamada, Gaurav Kumar, Caroline Favart, V. Ratna Saripalli, Gopal Avinash

As the complexity of state-of-the-art deep learning models increases by the month, implementation, interpretation, and traceability become ever-more-burdensome challenges for AI practitioners around the world. Several AI frameworks have risen in an effort to stem this tide, but the steady advance of the field has begun to test the bounds of their flexibility, expressiveness, and ease of use. To address these concerns, we introduce a radically flexible high-level open source deep learning framework for both research and industry. We introduce FastEstimator.

  Click for Model/Code and Paper
Overparameterized Neural Networks Can Implement Associative Memory

Sep 26, 2019
Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Identifying computational mechanisms for memorization and retrieval is a long-standing problem at the intersection of machine learning and neuroscience. In this work, we demonstrate empirically that overparameterized deep neural networks trained using standard optimization methods provide a mechanism for memorization and retrieval of real-valued data. In particular, we show that overparameterized autoencoders store training examples as attractors, and thus, can be viewed as implementations of associative memory with the retrieval mechanism given by iterating the map. We study this phenomenon under a variety of common architectures and optimization methods and construct a network that can recall 500 real-valued images without any apparent spurious attractor states. Lastly, we demonstrate how the same mechanism allows encoding sequences, including movies and audio, instead of individual examples. Interestingly, this appears to provide an even more efficient mechanism for storage and retrieval than autoencoding single instances.

  Click for Model/Code and Paper
Downsampling leads to Image Memorization in Convolutional Autoencoders

Oct 16, 2018
Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Memorization of data in deep neural networks has become a subject of significant research interest. In this paper, we link memorization of images in deep convolutional autoencoders to downsampling through strided convolution. To analyze this mechanism in a simpler setting, we train linear convolutional autoencoders and show that linear combinations of training data are stored as eigenvectors in the linear operator corresponding to the network when downsampling is used. On the other hand, networks without downsampling do not memorize training data. We provide further evidence that the same effect happens in nonlinear networks. Moreover, downsampling in nonlinear networks causes the model to not only memorize linear combinations of images, but individual training images. Since convolutional autoencoder components are building blocks of deep convolutional networks, we envision that our findings will shed light on the important phenomenon of memorization in over-parameterized deep networks.

  Click for Model/Code and Paper
Size of Interventional Markov Equivalence Classes in Random DAG Models

Mar 05, 2019
Dmitriy Katz, Karthikeyan Shanmugam, Chandler Squires, Caroline Uhler

Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erd\H{o}s-R\'{e}nyi graph. For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. We characterize I-MEC size in a similar fashion in the above settings with high precision. We show that the asymptotic expected number of interventions required to fully identify a DAG is a constant. These results are obtained by exploiting Meek rules and coupling arguments to provide sharp upper and lower bounds on the asymptotic quantities, which are then calculated numerically up to high precision. Our results have important consequences for experimental design of interventions and the development of algorithms for causal inference.

* 19 pages, 5 figures. Accepted to AISTATS 2019 

  Click for Model/Code and Paper
Ordering-Based Causal Structure Learning in the Presence of Latent Variables

Oct 20, 2019
Daniel Irving Bernstein, Basil Saeed, Chandler Squires, Caroline Uhler

We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belongs to the Markov equivalence class of the true model. This motivates the \emph{Sparsest Poset} formulation - that posets can be mapped to minimal IMAPs of the true model such that the sparsest of these IMAPs is Markov equivalent to the true model. Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data.

  Click for Model/Code and Paper
Predicting the Voltage Distribution for Low Voltage Networks using Deep Learning

Jun 19, 2019
Maizura Mokhtar, Valentin Robu, David Flynn, Ciaran Higgins, Jim Whyte, Caroline Loughran, Fiona Fulton

The energy landscape for the Low-Voltage (LV) networks are beginning to change; changes resulted from the increase penetration of renewables and/or the predicted increase of electric vehicles charging at home. The previously passive `fit-and-forget' approach to LV network management will be inefficient to ensure its effective operations. A more adaptive approach is required that includes the prediction of risk and capacity of the circuits. Many of the proposed methods require full observability of the networks, motivating the installations of smart meters and advance metering infrastructure in many countries. However, the expectation of `perfect data' is unrealistic in operational reality. Smart meter (SM) roll-out can have its issues, which may resulted in low-likelihood of full SM coverage for all LV networks. This, together with privacy requirements that limit the availability of high granularity demand power data have resulted in the low uptake of many of the presented methods. To address this issue, Deep Learning Neural Network is proposed to predict the voltage distribution with partial SM coverage. The results show that SM measurements from key locations are sufficient for effective prediction of voltage distribution.

* 9th IEEE International Conference on Innovative Smart Grid Technologies (IEEE ISGT Europe 2019) 

  Click for Model/Code and Paper
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection

Apr 15, 2018
Jo Schlemper, Ozan Oktay, Liang Chen, Jacqueline Matthew, Caroline Knight, Bernhard Kainz, Ben Glocker, Daniel Rueckert

In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model's reasoning process, which can also be used for weakly supervised object localisation.

* Submitted to MIDL2018 (OpenReview: 

  Click for Model/Code and Paper
Deep Learning for Estimating Synaptic Health of Primary Neuronal Cell Culture

Aug 29, 2019
Andrey Kormilitzin, Xinyu Yang, William H. Stone, Caroline Woffindale, Francesca Nicholls, Elena Ribe, Alejo Nevado-Holgado, Noel Buckley

Understanding the morphological changes of primary neuronal cells induced by chemical compounds is essential for drug discovery. Using the data from a single high-throughput imaging assay, a classification model for predicting the biological activity of candidate compounds was introduced. The image recognition model which is based on deep convolutional neural network (CNN) architecture with residual connections achieved accuracy of 99.6$\%$ on a binary classification task of distinguishing untreated and treated rodent primary neuronal cells with Amyloid-$\beta_{(25-35)}$.

* 11 pages, 5 figures 

  Click for Model/Code and Paper
Fast Multiple Landmark Localisation Using a Patch-based Iterative Network

Oct 07, 2018
Yuanwei Li, Amir Alansary, Juan J. Cerrolaza, Bishesh Khanal, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location. PIN is computationally efficient since the inference stage only selectively samples a small number of patches in an iterative fashion rather than a dense sampling at every location in the volume. Our approach adopts a multi-task learning framework that combines regression and classification to improve localisation accuracy. We extend PIN to localise multiple landmarks by using principal component analysis, which models the global anatomical relationships between landmarks. We have evaluated PIN using 72 3D ultrasound images from fetal screening examinations. PIN achieves quantitatively an average landmark localisation error of 5.59mm and a runtime of 0.44s to predict 10 landmarks per volume. Qualitatively, anatomical 2D standard scan planes derived from the predicted landmark locations are visually similar to the clinical ground truth. Source code is publicly available at

* LNCS 11070 (2018) 563-571 
* 8 pages, 4 figures, Accepted for MICCAI 2018 

  Click for Model/Code and Paper
Intraoperative Organ Motion Models with an Ensemble of Conditional Generative Adversarial Networks

Sep 05, 2017
Yipeng Hu, Eli Gibson, Tom Vercauteren, Hashim U. Ahmed, Mark Emberton, Caroline M. Moore, J. Alison Noble, Dean C. Barratt

In this paper, we describe how a patient-specific, ultrasound-probe-induced prostate motion model can be directly generated from a single preoperative MR image. Our motion model allows for sampling from the conditional distribution of dense displacement fields, is encoded by a generative neural network conditioned on a medical image, and accepts random noise as additional input. The generative network is trained by a minimax optimisation with a second discriminative neural network, tasked to distinguish generated samples from training motion data. In this work, we propose that 1) jointly optimising a third conditioning neural network that pre-processes the input image, can effectively extract patient-specific features for conditioning; and 2) combining multiple generative models trained separately with heuristically pre-disjointed training data sets can adequately mitigate the problem of mode collapse. Trained with diagnostic T2-weighted MR images from 143 real patients and 73,216 3D dense displacement fields from finite element simulations of intraoperative prostate motion due to transrectal ultrasound probe pressure, the proposed models produced physically-plausible patient-specific motion of prostate glands. The ability to capture biomechanically simulated motion was evaluated using two errors representing generalisability and specificity of the model. The median values, calculated from a 10-fold cross-validation, were 2.8+/-0.3 mm and 1.7+/-0.1 mm, respectively. We conclude that the introduced approach demonstrates the feasibility of applying state-of-the-art machine learning algorithms to generate organ motion models from patient images, and shows significant promise for future research.

* Accepted to MICCAI 2017 

  Click for Model/Code and Paper
Adversarial Deformation Regularization for Training Image Registration Neural Networks

May 27, 2018
Yipeng Hu, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M. Moore, Mark Emberton, Tom Vercauteren, J. Alison Noble, Dean C. Barratt

We describe an adversarial learning approach to constrain convolutional neural network training for image registration, replacing heuristic smoothness measures of displacement fields often used in these tasks. Using minimally-invasive prostate cancer intervention as an example application, we demonstrate the feasibility of utilizing biomechanical simulations to regularize a weakly-supervised anatomical-label-driven registration network for aligning pre-procedural magnetic resonance (MR) and 3D intra-procedural transrectal ultrasound (TRUS) images. A discriminator network is optimized to distinguish the registration-predicted displacement fields from the motion data simulated by finite element analysis. During training, the registration network simultaneously aims to maximize similarity between anatomical labels that drives image alignment and to minimize an adversarial generator loss that measures divergence between the predicted- and simulated deformation. The end-to-end trained network enables efficient and fully-automated registration that only requires an MR and TRUS image pair as input, without anatomical labels or simulated data during inference. 108 pairs of labelled MR and TRUS images from 76 prostate cancer patients and 71,500 nonlinear finite-element simulations from 143 different patients were used for this study. We show that, with only gland segmentation as training labels, the proposed method can help predict physically plausible deformation without any other smoothness penalty. Based on cross-validation experiments using 834 pairs of independent validation landmarks, the proposed adversarial-regularized registration achieved a target registration error of 6.3 mm that is significantly lower than those from several other regularization methods.

* Accepted to MICCAI 2018 

  Click for Model/Code and Paper
Standard Plane Detection in 3D Fetal Ultrasound Using an Iterative Transformation Network

Oct 07, 2018
Yuanwei Li, Bishesh Khanal, Benjamin Hou, Amir Alansary, Juan J. Cerrolaza, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

Standard scan plane detection in fetal brain ultrasound (US) forms a crucial step in the assessment of fetal development. In clinical settings, this is done by manually manoeuvring a 2D probe to the desired scan plane. With the advent of 3D US, the entire fetal brain volume containing these standard planes can be easily acquired. However, manual standard plane identification in 3D volume is labour-intensive and requires expert knowledge of fetal anatomy. We propose a new Iterative Transformation Network (ITN) for the automatic detection of standard planes in 3D volumes. ITN uses a convolutional neural network to learn the relationship between a 2D plane image and the transformation parameters required to move that plane towards the location/orientation of the standard plane in the 3D volume. During inference, the current plane image is passed iteratively to the network until it converges to the standard plane location. We explore the effect of using different transformation representations as regression outputs of ITN. Under a multi-task learning framework, we introduce additional classification probability outputs to the network to act as confidence measures for the regressed transformation parameters in order to further improve the localisation accuracy. When evaluated on 72 US volumes of fetal brain, our method achieves an error of 3.83mm/12.7 degrees and 3.80mm/12.6 degrees for the transventricular and transcerebellar planes respectively and takes 0.46s per plane. Source code is publicly available at

* LNCS 11070 (2018) 392-400 
* 8 pages, 2 figures, accepted for MICCAI 2018; Added link to source code 

  Click for Model/Code and Paper