Models, code, and papers for "Jing Cao":

MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation

Sep 08, 2019
Xia Liang, Junmin Wu, Jing Cao

Currently, almost all the multi-track music generation models use the Convolutional Neural Network (CNN) to build the generative model, while the Recurrent Neural Network (RNN) based models can not be applied in this task. In view of the above problem, this paper proposes a RNN-based Hierarchical Multi-modal Fusion Generation Variational Autoencoder (VAE) network, MIDI-Sandwich2, for multi-track symbolic music generation. Inspired by VQ-VAE2, MIDI-Sandwich2 expands the dimension of the original hierarchical model by using multiple independent Binary Variational Autoencoder (BVAE) models without sharing weights to process the information of each track. Then, with multi-modal fusion technology, the upper layer named Multi-modal Fusion Generation VAE (MFG-VAE) combines the latent space vectors generated by the respective tracks, and uses the decoder to perform the ascending dimension reconstruction to simulate the inverse operation of multi-modal fusion, multi-modal generation, so as to realize the RNN-based multi-track symbolic music generation. For the multi-track format pianoroll, we also improve the output binarization method of MuseGAN, which solves the problem that the refinement step of the original scheme is difficult to differentiate and the gradient is hard to descent, making the generated song more expressive. The model is validated on the Lakh Pianoroll Dataset (LPD) multi-track dataset. Compared to the MuseGAN, MIDI-Sandwich2 can not only generate harmonious multi-track music, the generation quality is also close to the state of the art level. At the same time, by using the VAE to restore songs, the semi-generated songs reproduced by the MIDI-Sandwich2 are more beautiful than the pure autogeneration music generated by MuseGAN. Both the code and the audition audio samples are open source on https://github.com/LiangHsia/MIDI-S2.


  Click for Model/Code and Paper
CariGANs: Unpaired Photo-to-Caricature Translation

Nov 02, 2018
Kaidi Cao, Jing Liao, Lu Yuan

Facial caricature is an art form of drawing faces in an exaggerated way to convey humor or sarcasm. In this paper, we propose the first Generative Adversarial Network (GAN) for unpaired photo-to-caricature translation, which we call "CariGANs". It explicitly models geometric exaggeration and appearance stylization using two components: CariGeoGAN, which only models the geometry-to-geometry transformation from face photos to caricatures, and CariStyGAN, which transfers the style appearance from caricatures to face photos without any geometry deformation. In this way, a difficult cross-domain translation problem is decoupled into two easier tasks. The perceptual study shows that caricatures generated by our CariGANs are closer to the hand-drawn ones, and at the same time better persevere the identity, compared to state-of-the-art methods. Moreover, our CariGANs allow users to control the shape exaggeration degree and change the color/texture style by tuning the parameters or giving an example caricature.

* ACM Transactions on Graphics, Vol. 37, No. 6, Article 244. Publication date: November 2018 
* To appear at SIGGRAPH Asia 2018 

  Click for Model/Code and Paper
Lagged Exact Bayesian Online Changepoint Detection with Parameter Estimation

Oct 13, 2018
Michael Byrd, Linh Nghiem, Jing Cao

Identifying changes in the generative process of sequential data, known as changepoint detection, has become an increasingly important topic for a wide variety of fields. A recently developed approach, which we call EXact Online Bayesian Changepoint Detection (EXO), has shown reasonable results with efficient computation for real time updates. The method is based on a \textit{forward} recursive message-passing algorithm. However, the detected changepoints from these methods are unstable. We propose a new algorithm called Lagged EXact Online Bayesian Changepoint Detection (LEXO) that improves the accuracy and stability of the detection by incorporating $\ell$-time lags to the inference. The new algorithm adds a recursive \textit{backward} step to the forward EXO and has computational complexity linear in the number of added lags. Estimation of parameters associated with regimes is also developed. Simulation studies with three common changepoint models show that the detected changepoints from LEXO are much more stable and parameter estimates from LEXO have considerably lower MSE than EXO. We illustrate applicability of the methods with two real world data examples comparing the EXO and LEXO.


  Click for Model/Code and Paper
Nighttime Haze Removal with Illumination Correction

Jun 05, 2016
Jing Zhang, Yang Cao, Zengfu Wang

Haze removal is important for computational photography and computer vision applications. However, most of the existing methods for dehazing are designed for daytime images, and cannot always work well in the nighttime. Different from the imaging conditions in the daytime, images captured in nighttime haze condition may suffer from non-uniform illumination due to artificial light sources, which exhibit low brightness/contrast and color distortion. In this paper, we present a new nighttime hazy imaging model that takes into account both the non-uniform illumination from artificial light sources and the scattering and attenuation effects of haze. Accordingly, we propose an efficient dehazing algorithm for nighttime hazy images. The proposed algorithm includes three sequential steps. i) It enhances the overall brightness by performing a gamma correction step after estimating the illumination from the original image. ii) Then it achieves a color-balance result by performing a color correction step after estimating the color characteristics of the incident light. iii) Finally, it remove the haze effect by applying the dark channel prior and estimating the point-wise environmental light based on the previous illumination-balance result. Experimental results show that the proposed algorithm can achieve illumination-balance and haze-free results with good color rendition ability.

* 14 pages, 18 figures 

  Click for Model/Code and Paper
A Survey of Optimization Methods from a Machine Learning Perspective

Jun 17, 2019
Shiliang Sun, Zehui Cao, Han Zhu, Jing Zhao

Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.


  Click for Model/Code and Paper
Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images

Aug 21, 2018
Jing Zhang, Yang Cao, Yang Wang, Chenglin Wen, Chang Wen Chen

Modeling statistical regularity plays an essential role in ill-posed image processing problems. Recently, deep learning based methods have been presented to implicitly learn statistical representation of pixel distributions in natural images and leverage it as a constraint to facilitate subsequent tasks, such as color constancy and image dehazing. However, the existing CNN architecture is prone to variability and diversity of pixel intensity within and between local regions, which may result in inaccurate statistical representation. To address this problem, this paper presents a novel fully point-wise CNN architecture for modeling statistical regularities in natural images. Specifically, we propose to randomly shuffle the pixels in the origin images and leverage the shuffled image as input to make CNN more concerned with the statistical properties. Moreover, since the pixels in the shuffled image are independent identically distributed, we can replace all the large convolution kernels in CNN with point-wise ($1*1$) convolution kernels while maintaining the representation ability. Experimental results on two applications: color constancy and image dehazing, demonstrate the superiority of our proposed network over the existing architectures, i.e., using 1/10$\sim$1/100 network parameters and computational cost while achieving comparable performance.

* 9 pages, 7 figures. To appear in ACM MM 2018 

  Click for Model/Code and Paper
Fine-grained ECG Classification Based on Deep CNN and Online Decision Fusion

Jan 19, 2019
Jing Zhang, Jing Tian, Yang Cao, Yuxiang Yang, Xiaobin Xu, Chenglin Wen

Early recognition of abnormal rhythm in ECG signals is crucial for monitoring or diagnosing patients' cardiac conditions and increasing the success rate of the treatment. Classifying abnormal rhythms into fine-grained categories is very challenging due to the the broad taxonomy of rhythms, noises and lack of real-world data and annotations from large number of patients. This paper presents a new ECG classification method based on Deep Convolutional Neural Networks (DCNN) and online decision fusion. Different from previous methods which utilize hand-crafted features or learn features from the original signal domain, the proposed DCNN based method learns features and classifiers from the time-frequency domain in an end-to-end manner. First, the ECG wave signal is transformed to time-frequency domain by using Short-Time Fourier Transform. Next, specific DCNN models are trained on ECG samples of specific length. Finally, an online decision fusion method is proposed to fuse past and current decisions from different models into a more accurate one. Experimental results on both synthetic and real-world ECG datasets convince the effectiveness and efficiency of the proposed method.

* 12 pages, 10 figures 

  Click for Model/Code and Paper
Adversarial Objects Against LiDAR-Based Autonomous Driving Systems

Jul 11, 2019
Yulong Cao, Chaowei Xiao, Dawei Yang, Jing Fang, Ruigang Yang, Mingyan Liu, Bo Li

Deep neural networks (DNNs) are found to be vulnerable against adversarial examples, which are carefully crafted inputs with a small magnitude of perturbation aiming to induce arbitrarily incorrect predictions. Recent studies show that adversarial examples can pose a threat to real-world security-critical applications: a "physical adversarial Stop Sign" can be synthesized such that the autonomous driving cars will misrecognize it as others (e.g., a speed limit sign). However, these image-space adversarial examples cannot easily alter 3D scans of widely equipped LiDAR or radar on autonomous vehicles. In this paper, we reveal the potential vulnerabilities of LiDAR-based autonomous driving detection systems, by proposing an optimization based approach LiDAR-Adv to generate adversarial objects that can evade the LiDAR-based detection system under various conditions. We first show the vulnerabilities using a blackbox evolution-based algorithm, and then explore how much a strong adversary can do, using our gradient-based approach LiDAR-Adv. We test the generated adversarial objects on the Baidu Apollo autonomous driving platform and show that such physical systems are indeed vulnerable to the proposed attacks. We also 3D-print our adversarial objects and perform physical experiments to illustrate that such vulnerability exists in the real world. Please find more visualizations and results on the anonymous website: https://sites.google.com/view/lidar-adv.


  Click for Model/Code and Paper
An Ontology-Based Artificial Intelligence Model for Medicine Side-Effect Prediction: Taking Traditional Chinese Medicine as An Example

Sep 12, 2018
Zeheng Wang, Kun Lu, Jun Cao, Yuanzhe Yao, Liang Li, Runyu Liu, Zhiyuan Liu, Jing Yan

In this work, an ontology-based model for AI-assisted medicine side-effect (SE) prediction is developed, where three main components, including the drug model, the treatment model, and the AI-assisted prediction model, of proposed model are presented. To validate the proposed model, an ANN structure is established and trained by two hundred and forty-two TCM prescriptions that are gathered and classified from the most famous ancient TCM book and more than one thousand SE reports, in which two ontology-based attributions, hot and cold, are simply introduced to evaluate whether the prediction will cause a SE or not. The results preliminarily reveal that it is a relationship between the ontology-based attributions and the corresponding indicator that can be learnt by AI for predicting the SE, which suggests the proposed model has a potential in AI-assisted SE prediction. However, it should be noted that, the proposed model highly depends on the sufficient clinic data, and hereby, much deeper exploration is important for enhancing the accuracy of the prediction.


  Click for Model/Code and Paper
Unsupervised Cross-Domain Recognition by Identifying Compact Joint Subspaces

Sep 05, 2015
Yuewei Lin, Jing Chen, Yu Cao, Youjie Zhou, Lingfeng Zhang, Yuan Yan Tang, Song Wang

This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between source and target domain, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption -- "the data samples from the same class should lay on a low-dimensional subspace, even if they come from different domains", the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the compact joint subspaces of source and target domain. Specifically, given labeled samples in source domain, we construct subspaces for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and highly likely all fall into the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across source and target domains, and within anchor subspaces, respectively.We further combine the anchor subspaces to corresponding source subspaces to construct the compact joint subspaces. Subsequently, one-vs-rest SVM classifiers are trained in the compact joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: object recognition dataset for computer vision tasks, and sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.

* ICIP 2015 Top 10% paper 

  Click for Model/Code and Paper
Person re-identification based on Res2Net network

Oct 08, 2019
Zongjing Cao, Hyo Jong Lee

Person re-identification (re-ID) has been gaining in popularity in the research community owing to its numerous applications and growing importance in the surveillance industry. Person re-ID remains challenging due to significant intra-class variations across different cameras. In this paper, we propose a multi-task network that simultaneously computes the identification loss and verification loss. Given a pair of input images, the network predicts the identities of the two input images and whether they belong to the same identity. In order to obtain deeper feature information of pedestrians, we propose to use the latest Res2Net network for feature extraction. Experiments on several large-scale person re-ID benchmark datasets demonstrate the accuracy of our approach. For example, rank-1 accuracies are 82.67% (+0.51) and 92.93% (+0.21) for the DukeMTMC and Market-1501 datasets, respectively. The proposed method shows encouraging improvements compared with state-of-the-art methods.

* 6 pages. arXiv admin note: text overlap with arXiv:1711.09349, arXiv:1904.01169, arXiv:1611.05666 by other authors 

  Click for Model/Code and Paper
Deformable Stacked Structure for Named Entity Recognition

Sep 28, 2018
Shuyang Cao, Xipeng Qiu, Xuanjing Huang

Neural architecture for named entity recognition has achieved great success in the field of natural language processing. Currently, the dominating architecture consists of a bi-directional recurrent neural network (RNN) as the encoder and a conditional random field (CRF) as the decoder. In this paper, we propose a deformable stacked structure for named entity recognition, in which the connections between two adjacent layers are dynamically established. We evaluate the deformable stacked structure by adapting it to different layers. Our model achieves the state-of-the-art performances on the OntoNotes dataset.


  Click for Model/Code and Paper
Learning Temporal Action Proposals With Fewer Labels

Oct 03, 2019
Jingwei Ji, Kaidi Cao, Juan Carlos Niebles

Temporal action proposals are a common module in action detection pipelines today. Most current methods for training action proposal modules rely on fully supervised approaches that require large amounts of annotated temporal action intervals in long video sequences. The large cost and effort in annotation that this entails motivate us to study the problem of training proposal modules with less supervision. In this work, we propose a semi-supervised learning algorithm specifically designed for training temporal action proposal networks. When only a small number of labels are available, our semi-supervised method generates significantly better proposals than the fully-supervised counterpart and other strong semi-supervised baselines. We validate our method on two challenging action detection video datasets, ActivityNet v1.3 and THUMOS14. We show that our semi-supervised approach consistently matches or outperforms the fully supervised state-of-the-art approaches.


  Click for Model/Code and Paper
Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

Jul 25, 2019
Jingya Liu, Liangliang Cao, Oguz Akin, Yingli Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset.

* 15 pages, 8 figures, 5 tables, under review by Medical Image Analysis. arXiv admin note: substantial text overlap with arXiv:1906.03467 

  Click for Model/Code and Paper
3DFPN-HS$^2$: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection

Jun 11, 2019
Jingya Liu, Liangliang Cao, Oguz Akin, Yingli Tian

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limited the automatic diagnosis in routine clinical practice. In this paper, we propose a novel pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS$^2$) network is introduced to eliminate the falsely detected nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate. The proposed framework is evaluated on the public Lung Nodule Analysis (LUNA16) challenge dataset. Our method is able to accurately detect lung nodules at high sensitivity and specificity and achieves $90.4\%$ sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results $15.6\%$.

* 8 pages, 3 figures. Accepted to MICCAI 2019 

  Click for Model/Code and Paper
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder

Jun 05, 2019
Shichen Cao, Jingjing Li, Kenric P. Nelson, Mark A. Kon

We present a coupled Variational Auto-Encoder (VAE) method that improves the accuracy and robustness of the probabilistic inferences on represented data. The new method models the dependency between input feature vectors (images) and weighs the outliers with a higher penalty by generalizing the original loss function to the coupled entropy function, using the principles of nonlinear statistical coupling. We evaluate the performance of the coupled VAE model using the MNIST dataset. Compared with the traditional VAE algorithm, the output images generated by the coupled VAE method are clearer and less blurry. The visualization of the input images embedded in 2D latent variable space provides a deeper insight into the structure of new model with coupled loss function: the latent variable has a smaller deviation and the output values are generated by a more compact latent space. We analyze the histogram of the likelihoods of the input images using the generalized mean, which measures the model's accuracy as a function of the relative risk. The neutral accuracy, which is the geometric mean and is consistent with a measure of the Shannon cross-entropy, is improved. The robust accuracy, measured by the -2/3 generalized mean, is also improved. And the decisive accuracy, measured by the arithmetic mean, is unchanged.

* 10 pages, 11 figures 

  Click for Model/Code and Paper
Deep Triplet Quantization

Feb 01, 2019
Bin Liu, Yue Cao, Mingsheng Long, Jianmin Wang, Jingdong Wang

Deep hashing establishes efficient and effective image retrieval by end-to-end learning of deep representations and hash codes from similarity data. We present a compact coding solution, focusing on deep learning to quantization approach that has shown superior performance over hashing solutions for similarity retrieval. We propose Deep Triplet Quantization (DTQ), a novel approach to learning deep quantization models from the similarity triplets. To enable more effective triplet training, we design a new triplet selection approach, Group Hard, that randomly selects hard triplets in each image group. To generate compact binary codes, we further apply a triplet quantization with weak orthogonality during triplet training. The quantization loss reduces the codebook redundancy and enhances the quantizability of deep representations through back-propagation. Extensive experiments demonstrate that DTQ can generate high-quality and compact binary codes, which yields state-of-the-art image retrieval performance on three benchmark datasets, NUS-WIDE, CIFAR-10, and MS-COCO.

* Accepted by ACM Multimedia 2018 as oral paper 

  Click for Model/Code and Paper
A Boosting Method to Face Image Super-resolution

May 04, 2018
Shanjun Mao, Da Zhou, Yiping Zhang, Zhihong Zhang, Jingjing Cao

Recently sparse representation has gained great success in face image super-resolution. The conventional sparsity-based methods enforce sparse coding on face image patches and the representation fidelity is measured by $\ell_{2}$-norm. Such a sparse coding model regularizes all facial patches equally, which however ignores distinct natures of different facial patches for image reconstruction. In this paper, we propose a new weighted-patch super-resolution method based on AdaBoost. Specifically, in each iteration of the AdaBoost operation, each facial patch is weighted automatically according to the performance of the model on it, so as to highlight those patches that are more critical for improving the reconstruction power in next step. In this way, through the AdaBoost training procedure, we can focus more on the patches (face regions) with richer information. Various experimental results on standard face database show that our proposed method outperforms state-of-the-art methods in terms of both objective metrics and visual quality.

* 14 pages, 3 figure 

  Click for Model/Code and Paper
BIRNet: Brain Image Registration Using Dual-Supervised Fully Convolutional Networks

Feb 13, 2018
Jingfan Fan, Xiaohuan Cao, Pew-Thian Yap, Dinggang Shen

In this paper, we propose a deep learning approach for image registration by predicting deformation from image appearance. Since obtaining ground-truth deformation fields for training can be challenging, we design a fully convolutional network that is subject to dual-guidance: (1) Coarse guidance using deformation fields obtained by an existing registration method; and (2) Fine guidance using image similarity. The latter guidance helps avoid overly relying on the supervision from the training deformation fields, which could be inaccurate. For effective training, we further improve the deep convolutional network with gap filling, hierarchical loss, and multi-source strategies. Experiments on a variety of datasets show promising registration accuracy and efficiency compared with state-of-the-art methods.


  Click for Model/Code and Paper
Sparse Representation Based Augmented Multinomial Logistic Extreme Learning Machine with Weighted Composite Features for Spectral Spatial Hyperspectral Image Classification

Oct 14, 2017
Faxian Cao, Zhijing Yang, Jinchang Ren, Wing-Kuen Ling

Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches.

* 16 pages, 6 figuers and 4 tables 

  Click for Model/Code and Paper