Models, code, and papers for "Caroline Chan":

This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We pose this problem as a per-frame image-to-image translation with spatio-temporal smoothing. Using pose detections as an intermediate representation between source and target, we learn a mapping from pose images to a target subject's appearance. We adapt this setup for temporally coherent video generation including realistic face synthesis. Our video demo can be found at https://youtu.be/PCBTZh41Ris .

Human speech is often accompanied by hand and arm gestures. Given audio speech input, we generate plausible gestures to go along with the sound. Specifically, we perform cross-modal translation from "in-the-wild'' monologue speech of a single speaker to their hand and arm motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures. The project website with video, code and data can be found at http://people.eecs.berkeley.edu/~shiry/speech2gesture .

As opportunities for AI-assisted healthcare grow steadily, model deployment faces challenges due to the specific characteristics of the industry. The configuration choice for a production device can impact model performance while influencing operational costs. Moreover, in healthcare some situations might require fast, but not real time, inference. We study different configurations and conduct a cost-performance analysis to determine the optimized hardware for the deployment of a model subject to healthcare domain constraints. We observe that a naive performance comparison may not lead to an optimal configuration selection. In fact, given realistic domain constraints, CPU execution might be preferable to GPU accelerators. Hence, defining beforehand precise expectations for model deployment is crucial.

As the complexity of state-of-the-art deep learning models increases by the month, implementation, interpretation, and traceability become ever-more-burdensome challenges for AI practitioners around the world. Several AI frameworks have risen in an effort to stem this tide, but the steady advance of the field has begun to test the bounds of their flexibility, expressiveness, and ease of use. To address these concerns, we introduce a radically flexible high-level open source deep learning framework for both research and industry. We introduce FastEstimator.

Identifying computational mechanisms for memorization and retrieval is a long-standing problem at the intersection of machine learning and neuroscience. In this work, we demonstrate empirically that overparameterized deep neural networks trained using standard optimization methods provide a mechanism for memorization and retrieval of real-valued data. In particular, we show that overparameterized autoencoders store training examples as attractors, and thus, can be viewed as implementations of associative memory with the retrieval mechanism given by iterating the map. We study this phenomenon under a variety of common architectures and optimization methods and construct a network that can recall 500 real-valued images without any apparent spurious attractor states. Lastly, we demonstrate how the same mechanism allows encoding sequences, including movies and audio, instead of individual examples. Interestingly, this appears to provide an even more efficient mechanism for storage and retrieval than autoencoding single instances.

Memorization of data in deep neural networks has become a subject of significant research interest. In this paper, we link memorization of images in deep convolutional autoencoders to downsampling through strided convolution. To analyze this mechanism in a simpler setting, we train linear convolutional autoencoders and show that linear combinations of training data are stored as eigenvectors in the linear operator corresponding to the network when downsampling is used. On the other hand, networks without downsampling do not memorize training data. We provide further evidence that the same effect happens in nonlinear networks. Moreover, downsampling in nonlinear networks causes the model to not only memorize linear combinations of images, but individual training images. Since convolutional autoencoder components are building blocks of deep convolutional networks, we envision that our findings will shed light on the important phenomenon of memorization in over-parameterized deep networks.

Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erd\H{o}s-R\'{e}nyi graph. For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. We characterize I-MEC size in a similar fashion in the above settings with high precision. We show that the asymptotic expected number of interventions required to fully identify a DAG is a constant. These results are obtained by exploiting Meek rules and coupling arguments to provide sharp upper and lower bounds on the asymptotic quantities, which are then calculated numerically up to high precision. Our results have important consequences for experimental design of interventions and the development of algorithms for causal inference.

We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belongs to the Markov equivalence class of the true model. This motivates the \emph{Sparsest Poset} formulation - that posets can be mapped to minimal IMAPs of the true model such that the sparsest of these IMAPs is Markov equivalent to the true model. Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data.

The energy landscape for the Low-Voltage (LV) networks are beginning to change; changes resulted from the increase penetration of renewables and/or the predicted increase of electric vehicles charging at home. The previously passive `fit-and-forget' approach to LV network management will be inefficient to ensure its effective operations. A more adaptive approach is required that includes the prediction of risk and capacity of the circuits. Many of the proposed methods require full observability of the networks, motivating the installations of smart meters and advance metering infrastructure in many countries. However, the expectation of `perfect data' is unrealistic in operational reality. Smart meter (SM) roll-out can have its issues, which may resulted in low-likelihood of full SM coverage for all LV networks. This, together with privacy requirements that limit the availability of high granularity demand power data have resulted in the low uptake of many of the presented methods. To address this issue, Deep Learning Neural Network is proposed to predict the voltage distribution with partial SM coverage. The results show that SM measurements from key locations are sufficient for effective prediction of voltage distribution.

In this work, we apply an attention-gated network to real-time automated scan plane detection for fetal ultrasound screening. Scan plane detection in fetal ultrasound is a challenging problem due the poor image quality resulting in low interpretability for both clinicians and automated algorithms. To solve this, we propose incorporating self-gated soft-attention mechanisms. A soft-attention mechanism generates a gating signal that is end-to-end trainable, which allows the network to contextualise local information useful for prediction. The proposed attention mechanism is generic and it can be easily incorporated into any existing classification architectures, while only requiring a few additional parameters. We show that, when the base network has a high capacity, the incorporated attention mechanism can provide efficient object localisation while improving the overall performance. When the base network has a low capacity, the method greatly outperforms the baseline approach and significantly reduces false positives. Lastly, the generated attention maps allow us to understand the model's reasoning process, which can also be used for weakly supervised object localisation.

Understanding the morphological changes of primary neuronal cells induced by chemical compounds is essential for drug discovery. Using the data from a single high-throughput imaging assay, a classification model for predicting the biological activity of candidate compounds was introduced. The image recognition model which is based on deep convolutional neural network (CNN) architecture with residual connections achieved accuracy of 99.6$\%$ on a binary classification task of distinguishing untreated and treated rodent primary neuronal cells with Amyloid-$\beta_{(25-35)}$.

We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location. PIN is computationally efficient since the inference stage only selectively samples a small number of patches in an iterative fashion rather than a dense sampling at every location in the volume. Our approach adopts a multi-task learning framework that combines regression and classification to improve localisation accuracy. We extend PIN to localise multiple landmarks by using principal component analysis, which models the global anatomical relationships between landmarks. We have evaluated PIN using 72 3D ultrasound images from fetal screening examinations. PIN achieves quantitatively an average landmark localisation error of 5.59mm and a runtime of 0.44s to predict 10 landmarks per volume. Qualitatively, anatomical 2D standard scan planes derived from the predicted landmark locations are visually similar to the clinical ground truth. Source code is publicly available at https://github.com/yuanwei1989/landmark-detection.

In this paper, we describe how a patient-specific, ultrasound-probe-induced prostate motion model can be directly generated from a single preoperative MR image. Our motion model allows for sampling from the conditional distribution of dense displacement fields, is encoded by a generative neural network conditioned on a medical image, and accepts random noise as additional input. The generative network is trained by a minimax optimisation with a second discriminative neural network, tasked to distinguish generated samples from training motion data. In this work, we propose that 1) jointly optimising a third conditioning neural network that pre-processes the input image, can effectively extract patient-specific features for conditioning; and 2) combining multiple generative models trained separately with heuristically pre-disjointed training data sets can adequately mitigate the problem of mode collapse. Trained with diagnostic T2-weighted MR images from 143 real patients and 73,216 3D dense displacement fields from finite element simulations of intraoperative prostate motion due to transrectal ultrasound probe pressure, the proposed models produced physically-plausible patient-specific motion of prostate glands. The ability to capture biomechanically simulated motion was evaluated using two errors representing generalisability and specificity of the model. The median values, calculated from a 10-fold cross-validation, were 2.8+/-0.3 mm and 1.7+/-0.1 mm, respectively. We conclude that the introduced approach demonstrates the feasibility of applying state-of-the-art machine learning algorithms to generate organ motion models from patient images, and shows significant promise for future research.

We describe an adversarial learning approach to constrain convolutional neural network training for image registration, replacing heuristic smoothness measures of displacement fields often used in these tasks. Using minimally-invasive prostate cancer intervention as an example application, we demonstrate the feasibility of utilizing biomechanical simulations to regularize a weakly-supervised anatomical-label-driven registration network for aligning pre-procedural magnetic resonance (MR) and 3D intra-procedural transrectal ultrasound (TRUS) images. A discriminator network is optimized to distinguish the registration-predicted displacement fields from the motion data simulated by finite element analysis. During training, the registration network simultaneously aims to maximize similarity between anatomical labels that drives image alignment and to minimize an adversarial generator loss that measures divergence between the predicted- and simulated deformation. The end-to-end trained network enables efficient and fully-automated registration that only requires an MR and TRUS image pair as input, without anatomical labels or simulated data during inference. 108 pairs of labelled MR and TRUS images from 76 prostate cancer patients and 71,500 nonlinear finite-element simulations from 143 different patients were used for this study. We show that, with only gland segmentation as training labels, the proposed method can help predict physically plausible deformation without any other smoothness penalty. Based on cross-validation experiments using 834 pairs of independent validation landmarks, the proposed adversarial-regularized registration achieved a target registration error of 6.3 mm that is significantly lower than those from several other regularization methods.

Standard scan plane detection in fetal brain ultrasound (US) forms a crucial step in the assessment of fetal development. In clinical settings, this is done by manually manoeuvring a 2D probe to the desired scan plane. With the advent of 3D US, the entire fetal brain volume containing these standard planes can be easily acquired. However, manual standard plane identification in 3D volume is labour-intensive and requires expert knowledge of fetal anatomy. We propose a new Iterative Transformation Network (ITN) for the automatic detection of standard planes in 3D volumes. ITN uses a convolutional neural network to learn the relationship between a 2D plane image and the transformation parameters required to move that plane towards the location/orientation of the standard plane in the 3D volume. During inference, the current plane image is passed iteratively to the network until it converges to the standard plane location. We explore the effect of using different transformation representations as regression outputs of ITN. Under a multi-task learning framework, we introduce additional classification probability outputs to the network to act as confidence measures for the regressed transformation parameters in order to further improve the localisation accuracy. When evaluated on 72 US volumes of fetal brain, our method achieves an error of 3.83mm/12.7 degrees and 3.80mm/12.6 degrees for the transventricular and transcerebellar planes respectively and takes 0.46s per plane. Source code is publicly available at https://github.com/yuanwei1989/plane-detection.