Models, code, and papers for "Edward Y. Chang":

RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Aug 20, 2019
Po-Wei Wu, Yu-Jing Lin, Che-Han Chang, Edward Y. Chang, Shih-Wei Liao

Multi-domain image-to-image translation has gained increasing attention recently. Previous methods take an image and some target attributes as inputs and generate an output image with the desired attributes. However, such methods have two limitations. First, these methods assume binary-valued attributes and thus cannot yield satisfactory results for fine-grained control. Second, these methods require specifying the entire set of target attributes, even if most of the attributes would not be changed. To address these limitations, we propose RelGAN, a new method for multi-domain image-to-image translation. The key idea is to use relative attributes, which describes the desired change on selected attributes. Our method is capable of modifying images by changing particular attributes of interest in a continuous manner while preserving the other attributes. Experimental results demonstrate both the quantitative and qualitative effectiveness of our method on the tasks of facial attribute transfer and interpolation.

* Accepted to ICCV 2019 

  Click for Model/Code and Paper
BRIEF: Backward Reduction of CNNs with Information Flow Analysis

Nov 01, 2018
Yu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang

This paper proposes BRIEF, a backward reduction algorithm that explores compact CNN-model designs from the information flow perspective. This algorithm can remove substantial non-zero weighting parameters (redundant neural channels) of a network by considering its dynamic behavior, which traditional model-compaction techniques cannot achieve. With the aid of our proposed algorithm, we achieve significant model reduction on ResNet-34 in the ImageNet scale (32.3% reduction), which is 3X better than the previous result (10.8%). Even for highly optimized models such as SqueezeNet and MobileNet, we can achieve additional 10.81% and 37.56% reduction, respectively, with negligible performance degradation.

* IEEE Artificial Intelligence and Virtual Reality (IEEE AIVR) 2018 

  Click for Model/Code and Paper
MBS: Macroblock Scaling for CNN Model Reduction

Sep 18, 2018
Yu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang

We estimate the proper channel (width) scaling of Convolution Neural Networks (CNNs) for model reduction. Unlike the traditional scaling method that reduces every CNN channel width by the same scaling factor, we address each CNN macroblock adaptively depending on its information redundancy measured by our proposed effective flops. Our proposed macroblock scaling (MBS) algorithm can be applied to various CNN architectures to reduce their model size. These applicable models range from compact CNN models such as MobileNet (25.53% reduction, ImageNet) and ShuffleNet (20.74% reduction, ImageNet) to ultra-deep ones such as ResNet-101 (51.67% reduction, ImageNet) and ResNet-1202 (72.71% reduction, CIFAR-10) with negligible accuracy degradation. MBS also performs better reduction at a much lower cost than does the state-of-the-art optimization-based method. MBS's simplicity and efficiency, its flexibility to work with any CNN model, and its scalability to work with models of any depth makes it an attractive choice for CNN model size reduction.

* 8 pages 

  Click for Model/Code and Paper
BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

Feb 20, 2018
Sheng-Wei Chen, Chun-Nan Chou, Edward Y. Chang

We propose a block-diagonal approximation of the positive-curvature Hessian (BDA-PCH) matrix to measure curvature. Our proposed BDAPCH matrix is memory efficient and can be applied to any fully-connected neural networks where the activation and criterion functions are twice differentiable. Particularly, our BDA-PCH matrix can handle non-convex criterion functions. We devise an efficient scheme utilizing the conjugate gradient method to derive Newton directions for mini-batch setting. Empirical studies show that our method outperforms the competing second-order methods in convergence speed.

* Correct the author's name 

  Click for Model/Code and Paper
G2R Bound: A Generalization Bound for Supervised Learning from GAN-Synthetic Data

May 29, 2019
Fu-Chieh Chang, Hao-Jen Wang, Chun-Nan Chou, Edward Y. Chang

Performing supervised learning from the data synthesized by using Generative Adversarial Networks (GANs), dubbed GAN-synthetic data, has two important applications. First, GANs may generate more labeled training data, which may help improve classification accuracy. Second, in scenarios where real data cannot be released outside certain premises for privacy and/or security reasons, using GAN- synthetic data to conduct training is a plausible alternative. This paper proposes a generalization bound to guarantee the generalization capability of a classifier learning from GAN-synthetic data. This generalization bound helps developers gauge the generalization gap between learning from synthetic data and testing on real data, and can therefore provide the clues to improve the generalization capability.


  Click for Model/Code and Paper
KG-GAN: Knowledge-Guided Generative Adversarial Networks

May 29, 2019
Che-Han Chang, Chun-Hsien Yu, Szu-Ying Chen, Edward Y. Chang

Generative adversarial networks (GANs) learn to mimic training data that represents the underlying true data distribution. However, GANs suffer when the training data lacks quantity or diversity and therefore cannot represent the underlying distribution well. To improve the performance of GANs trained on under-represented training data distributions, this paper proposes KG-GAN to fuse domain knowledge with the GAN framework. KG-GAN trains two generators; one learns from data while the other learns from knowledge. To achieve KG-GAN, domain knowledge is formulated as a constraint function to guide the learning of the second generator. We validate our framework on two tasks: fine-grained image generation and hair recoloring. Experimental results demonstrate the effectiveness of KG-GAN.

* Submitted to NeurIPS 2019. The supplementary material can be found at https://www.csie.ntu.edu.tw/~b00902029/KGGAN_Supp.pdf 

  Click for Model/Code and Paper
Effective Medical Test Suggestions Using Deep Reinforcement Learning

May 31, 2019
Yang-En Chen, Kai-Fu Tang, Yu-Shao Peng, Edward Y. Chang

Effective medical test suggestions benefit both patients and physicians to conserve time and improve diagnosis accuracy. In this work, we show that an agent can learn to suggest effective medical tests. We formulate the problem as a stage-wise Markov decision process and propose a reinforcement learning method to train the agent. We introduce a new representation of multiple action policy along with the training method of the proposed representation. Furthermore, a new exploration scheme is proposed to accelerate the learning of disease distributions. Our experimental results demonstrate that the accuracy of disease diagnosis can be significantly improved with good medical test suggestions.


  Click for Model/Code and Paper
Representation Learning on Large and Small Data

Jul 25, 2017
Chun-Nan Chou, Chuen-Kai Shie, Fu-Chieh Chang, Jocelyn Chang, Edward Y. Chang

Deep learning owes its success to three key factors: scale of data, enhanced models to learn representations from data, and scale of computation. This book chapter presented the importance of the data-driven approach to learn good representations from both big data and small data. In terms of big data, it has been widely accepted in the research community that the more data the better for both representation and classification improvement. The question is then how to learn representations from big data, and how to perform representation learning when data is scarce. We addressed the first question by presenting CNN model enhancements in the aspects of representation, optimization, and generalization. To address the small data challenge, we showed transfer representation learning to be effective. Transfer representation learning transfers the learned representation from a source domain where abundant training data is available to a target domain where training data is scarce. Transfer representation learning gave the OM and melanoma diagnosis modules of our XPRIZE Tricorder device (which finished $2^{nd}$ out of $310$ competing teams) a significant boost in diagnosis accuracy.

* Book chapter 

  Click for Model/Code and Paper
Errata: Distant Supervision for Relation Extraction with Matrix Completion

Nov 17, 2014
Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, Edward Y. Chang

The essence of distantly supervised relation extraction is that it is an incomplete multi-label classification problem with sparse and noisy features. To tackle the sparsity and noise challenges, we propose solving the classification problem using matrix completion on factorized matrix of minimized rank. We formulate relation classification as completing the unknown labels of testing items (entity pairs) in a sparse matrix that concatenates training and testing textual features with training labels. Our algorithmic framework is based on the assumption that the rank of item-by-feature and item-by-label joint matrix is low. We apply two optimization models to recover the underlying low-rank matrix leveraging the sparsity of feature-label matrix. The matrix completion problem is then solved by the fixed point continuation (FPC) algorithm, which can find the global optimum. Experiments on two widely used datasets with different dimensions of textual features demonstrate that our low-rank matrix completion approach significantly outperforms the baseline and the state-of-the-art methods.


  Click for Model/Code and Paper
Distributed Training Large-Scale Deep Architectures

Aug 10, 2017
Shang-Xuan Zou, Chun-Yen Chen, Jui-Lin Wu, Chun-Nan Chou, Chia-Chin Tsao, Kuan-Chieh Tung, Ting-Wei Lin, Cheng-Lung Sung, Edward Y. Chang

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.


  Click for Model/Code and Paper
Modeling neural dynamics during speech production using a state space variational autoencoder

Jan 13, 2019
Pengfei Sun, David A. Moses, Edward Chang

Characterizing the neural encoding of behavior remains a challenging task in many research areas due in part to complex and noisy spatiotemporal dynamics of evoked brain activity. An important aspect of modeling these neural encodings involves separation of robust, behaviorally relevant signals from background activity, which often contains signals from irrelevant brain processes and decaying information from previous behavioral events. To achieve this separation, we develop a two-branch State Space Variational AutoEncoder (SSVAE) model to individually describe the instantaneous evoked foreground signals and the context-dependent background signals. We modeled the spontaneous speech-evoked brain dynamics using smoothed Gaussian mixture models. By applying the proposed SSVAE model to track ECoG dynamics in one participant over multiple hours, we find that the model can predict speech-related dynamics more accurately than other latent factor inference algorithms. Our results demonstrate that separately modeling the instantaneous speech-evoked and slow context-dependent brain dynamics can enhance tracking performance, which has important implications for the development of advanced neural encoding and decoding models in various neuroscience sub-disciplines.

* 5 pages 

  Click for Model/Code and Paper
Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

Dec 31, 2018
Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, Elsa Olivetti

Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a named entity recognition model, upon which a conditional variational autoencoder is trained to generate syntheses for arbitrary materials. We show the potential of this technique by predicting precursors for two perovskite materials, using only training data published over a decade prior to their first reported syntheses. We demonstrate that the model learns representations of materials corresponding to synthesis-related properties, and that the model's behavior complements existing thermodynamic knowledge. Finally, we apply the model to perform synthesizability screening for proposed novel perovskite compounds.


  Click for Model/Code and Paper
Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

Nov 02, 2017
Kristofer E. Bouchard, Alejandro F. Bujan, Farbod Roosta-Khorasani, Shashanka Ubaru, Prabhat, Antoine M. Snijders, Jian-Hua Mao, Edward F. Chang, Michael W. Mahoney, Sharmodeep Bhattacharyya

The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications. Realizing this potential, however, requires novel statistical analysis methods that are both interpretable and predictive. We introduce Union of Intersections (UoI), a flexible, modular, and scalable framework for enhanced model selection and estimation. Methods based on UoI perform model selection and model estimation through intersection and union operations, respectively. We show that UoI-based methods achieve low-variance and nearly unbiased estimation of a small number of interpretable features, while maintaining high-quality prediction accuracy. We perform extensive numerical investigation to evaluate a UoI algorithm ($UoI_{Lasso}$) on synthetic and real data. In doing so, we demonstrate the extraction of interpretable functional networks from human electrophysiology recordings as well as accurate prediction of phenotypes from genotype-phenotype data with reduced features. We also show (with the $UoI_{L1Logistic}$ and $UoI_{CUR}$ variants of the basic framework) improved prediction parsimony for classification and matrix factorization on several benchmark biomedical data sets. These results suggest that methods based on the UoI framework could improve interpretation and prediction in data-driven discovery across scientific fields.

* 42 pages; a conference version is in NIPS 2017 

  Click for Model/Code and Paper
Growing and Retaining AI Talent for the United States Government

Sep 27, 2018
Edward Raff

Artificial Intelligence and Machine Learning have become transformative to a number of industries, and as such many industries need for AI talent is increasing the demand for individuals with these skills. This continues to exacerbate the difficulty of acquiring and retaining talent for the United States Federal Government, both for its direct employees as well as the companies that support it. We take the position that by focusing on growing and retaining current talent through a number of cultural changes, the government can work to remediate this problem today.

* Presented at AAAI FSS-18: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA 

  Click for Model/Code and Paper
Detecting Human Interventions on the Landscape: KAZE Features, Poisson Point Processes, and a Construction Dataset

Mar 29, 2017
Edward Boyda, Colin McCormick, Dan Hammer

We present an algorithm capable of identifying a wide variety of human-induced change on the surface of the planet by analyzing matches between local features in time-sequenced remote sensing imagery. We evaluate feature sets, match protocols, and the statistical modeling of feature matches. With application of KAZE features, k-nearest-neighbor descriptor matching, and geometric proximity and bi-directional match consistency checks, average match rates increase more than two-fold over the previous standard. In testing our platform, we developed a small, labeled benchmark dataset expressing large-scale residential, industrial, and civic construction, along with null instances, in California between the years 2010 and 2012. On the benchmark set, our algorithm makes precise, accurate change proposals on two-thirds of scenes. Further, the detection threshold can be tuned so that all or almost all proposed detections are true positives.


  Click for Model/Code and Paper
Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus

Jun 12, 2018
William Fleshman, Edward Raff, Richard Zak, Mark McLean, Charles Nicholas

As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today. It is not practical to build an agreed upon test set to benchmark malware detection systems on pure classification performance. Instead we tackle the problem by creating a new testing methodology, where we evaluate the change in performance on a set of known benign & malicious files as adversarial modifications are performed. The change in performance combined with the evasion techniques then quantifies a system's robustness against that approach. Through these experiments we are able to show in a quantifiable way how purely ML based systems can be more robust than AV products at detecting malware that attempts evasion through modification, but may be slower to adapt in the face of significantly novel attacks.


  Click for Model/Code and Paper
Plug and play methods for magnetic resonance imaging

Mar 20, 2019
Rizwan Ahmad, Charles A. Bouman, Gregery T. Buzzard, Stanley Chan, Edward T. Reehorst, Philip Schniter

Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. But, compared to other clinical imaging modalities (e.g., CT or ultrasound), the data acquisition process for MRI is inherently slow. Furthermore, dynamic applications demand collecting a series of images in quick succession. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades. The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications. More recently, deep learning techniques have been shown to outperform CS methods. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) to map from the measured Fourier samples and the final image. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. An alternative is to use "plug-and-play" (PnP) algorithms, which iterate image denoising with forward-model based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, with PnP, state-of-the-art image-denoising techniques, such as those based on DNNs, can be directly exploited for compressive MRI image reconstruction. The objective of this article is two-fold: i) to review recent advances in plug-and-play methods, and ii) to discuss their application to compressive MRI image reconstruction.


  Click for Model/Code and Paper
Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection

Jun 07, 2018
Edward McFowland III, Sriram Somanchi, Daniel B. Neill

In the recent literature on estimating heterogeneous treatment effects, each proposed method makes its own set of restrictive assumptions about the intervention's effects and which subpopulations to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected--beyond manual inspection--and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame this challenge as a pattern detection problem where we efficiently maximize a nonparametric scan statistic over subpopulations. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention's effects or the underlying data generating process. In addition to the algorithm, we demonstrate that the asymptotic Type I and II error can be controlled, and provide sufficient conditions for detection consistency--i.e., exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in real-world data from a well-known program evaluation study.


  Click for Model/Code and Paper
RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records

Oct 23, 2018
Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, Jaegul Choo

We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular prediction. Such black-box nature of RNNs can impede its wide adoption in clinical practice. Furthermore, we have no established methods to interactively leverage users' domain expertise and prior knowledge as inputs for steering the model. Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers. Following the iterative design process between the experts, we design, implement, and evaluate a visual analytics tool called RetainVis, which couples a newly improved, interpretable and interactive RNN-based model called RetainEX and visualizations for users' exploration of EMR data in the context of prediction tasks. Our study shows the effective use of RetainVis for gaining insights into how individual medical codes contribute to making risk predictions, using EMRs of patients with heart failure and cataract symptoms. Our study also demonstrates how we made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity. This study will provide a useful guideline for researchers that aim to design an interpretable and interactive visual analytics tool for RNNs.

* Accepted at IEEE VIS 2018. To appear in IEEE Transactions on Visualization and Computer Graphics in January 2019 

  Click for Model/Code and Paper
Analyzing the Role of Model Uncertainty for Electronic Health Records

Jun 10, 2019
Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai

In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.

* Presented at the ICML 2019 Workshop on Uncertainty & Robustness in Deep Learning. Code to be open-sourced 

  Click for Model/Code and Paper