Models, code, and papers for "Mingming He":

Multi-path Convolutional Neural Networks for Complex Image Classification

Jun 22, 2015
Mingming Wang

Convolutional Neural Networks demonstrate high performance on ImageNet Large-Scale Visual Recognition Challenges contest. Nevertheless, the published results only show the overall performance for all image classes. There is no further analysis why certain images get worse results and how they could be improved. In this paper, we provide deep performance analysis based on different types of images and point out the weaknesses of convolutional neural networks through experiment. We design a novel multiple paths convolutional neural network, which feeds different versions of images into separated paths to learn more comprehensive features. This model has better presentation for image than the traditional single path model. We acquire better classification results on complex validation set on both top 1 and top 5 scores than the best ILSVRC 2013 classification model.


  Click for Model/Code and Paper
Interpreting and Understanding Graph Convolutional Neural Network using Gradient-based Attribution Method

Apr 16, 2019
Shangsheng Xie, Mingming Lu

To solve the problem that convolutional neural networks (CNNs) are difficult to process non-grid type relational data like graphs, Kipf et al. proposed a graph convolutional neural network (GCN). The core idea of the GCN is to perform two-fold informational fusion for each node in a given graph during each iteration: the fusion of graph structure information and the fusion of node feature dimensions. Because of the characteristic of the combinatorial generalizations, GCN has been widely used in the fields of scene semantic relationship analysis, natural language processing and few-shot learning etc. However, due to its two-fold informational fusion involves mathematical irreversible calculations, it is hard to explain the decision reason for the prediction of the each node classification. Unfortunately, most of the existing attribution analysis methods concentrate on the models like CNNs, which are utilized to process grid-like data. It is difficult to apply those analysis methods to the GCN directly. It is because compared with the independence among CNNs input data, there is correlation between the GCN input data. This resulting in the existing attribution analysis methods can only obtain the partial model contribution from the central node features to the final decision of the GCN, but ignores the other model contribution from central node features and its neighbor nodes features to that decision. To this end, we propose a gradient attribution analysis method for the GCN called Node Attribution Method (NAM), which can get the model contribution from not only the central node but also its neighbor nodes to the GCN output. We also propose the Node Importance Visualization (NIV) method to visualize the central node and its neighbor nodes based on the value of the contribution...

* 8 pages, 9 figures 

  Click for Model/Code and Paper
Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation

Mar 13, 2019
Ruirui Li, Mingming Li, Jiacheng Li

We develop a connection sensitive attention U-Net(CSAU) for accurate retinal vessel segmentation. This method improves the recent attention U-Net for semantic segmentation with four key improvements: (1) connection sensitive loss that models the structure properties to improve the accuracy of pixel-wise segmentation; (2) attention gate with novel neural network structure and concatenating DOWN-Link to effectively learn better attention weights on fine vessels; (3) integration of connection sensitive loss and attention gate to further improve the accuracy on detailed vessels by additionally concatenating attention weights to features before output; (4) metrics of connection sensitive accuracy to reflect the segmentation performance on boundaries and thin vessels. Our method can effectively improve state-of-the-art vessel segmentation methods that suffer from difficulties in presence of abnormalities, bifurcation and microvascular. This connection sensitive loss tightly integrates with the proposed attention U-Net to accurately (i) segment retinal vessels, and (ii) reserve the connectivity of thin vessels by modeling the structural properties. Our method achieves the leading position on DRIVE, STARE and HRF datasets among the state-of-the-art methods.


  Click for Model/Code and Paper
Based on Graph-VAE Model to Predict Student's Score

Mar 08, 2019
Yang Zhang, Mingming Lu

The OECD pointed out that the best way to keep students up to school is to intervene as early as possible [1]. Using education big data and deep learning to predict student's score provides new resources and perspectives for early intervention. Previous forecasting schemes often requires manual filter of features , a large amount of prior knowledge and expert knowledge. Deep learning can automatically extract features without manual intervention to achieve better predictive performance. In this paper, the graph neural network matrix filling model (Graph-VAE) based on deep learning can automatically extract features without a large amount of prior knowledge. The experiment proves that our model is better than the traditional solution in the student's score dataset, and it better describes the correlation and difference between the students and the curriculum, and dimensionality reducing the vector of coding result is visualized, the clustering effect is consistent with the real data distribution clustering. In addition, we use gradient-based attribution methods to analyze the key factors that influence performance prediction.


  Click for Model/Code and Paper
A lossless data hiding scheme in JPEG images with segment coding

Jan 31, 2019
Mingming Zhang, Quan Zhou, Yanlang Hu

In this paper, we propose a lossless data hiding scheme in JPEG images. After quantified DCT transform, coefficients have characteristics that distribution in high frequencies is relatively sparse and absolute values are small. To improve encoding efficiency, we put forward an encoding algorithm that searches for a high frequency as terminate point and recode the coefficients above, so spare space is reserved to embed secret data and appended data with no file expansion. Receiver can obtain terminate point through data analysis, extract additional data and recover original JPEG images lossless. Experimental results show that the proposed method has a larger capacity than state-of-the-art works.

* 14 pages, 5 figures, 8 tables 

  Click for Model/Code and Paper
Localization for Ground Robots: On Manifold Representation, Integration, Re-Parameterization, and Optimization

Sep 25, 2019
Mingming Zhang, Xingxing Zuo, Yiming Chen, Mingyang Li

In this paper, we focus on localizing ground robots, by probabilistically fusing measurements from the wheel odometry and a monocular camera. For ground robots, the wheel odometry is widely used in localization tasks, especially in applications under planar-scene based environments. However, since the wheel odometry only provides 2D motion estimates, it is extremely challenging to use that for performing accurate full 6D pose (3D position and 3D rotation) estimation. Traditional methods on 6D localization either approximate sensor or motion models, at a cost of accuracy reduction, or rely on other sensors, e.g., inertial measurement unit (IMU), to obtain full 6D motion. By contrast, in this paper, we propose a novel probabilistic framework that is able to use the wheel odometry measurements for high-precision 6D pose estimation, in which only the wheel odometry and a monocular camera are mandatory. Specifically, we propose novel methods for i) formulating a motion manifold by parametric representation, ii) performing manifold based 6D integration with the wheel odometry measurements, and iii) re-parameterizing manifold equations periodically for error reduction. Finally, we propose a complete localization algorithm based on a manifold-assisted sliding-window estimator, fusing measurements from the wheel odometry, a monocular camera, and optionally an IMU. By extensive simulated and real-world experiments, we show that the proposed algorithm outperforms a number of state-of-the-art vision based localization algorithms by a significant margin, especially when deployed in large-scale complicated environments.


  Click for Model/Code and Paper
Likelihood-Free Overcomplete ICA and Applications in Causal Discovery

Sep 05, 2019
Chenwei Ding, Mingming Gong, Kun Zhang, Dacheng Tao

Causal discovery witnessed significant progress over the past decades. In particular, many recent causal discovery methods make use of independent, non-Gaussian noise to achieve identifiability of the causal models. Existence of hidden direct common causes, or confounders, generally makes causal discovery more difficult; whenever they are present, the corresponding causal discovery algorithms can be seen as extensions of overcomplete independent component analysis (OICA). However, existing OICA algorithms usually make strong parametric assumptions on the distribution of independent components, which may be violated on real data, leading to sub-optimal or even wrong solutions. In addition, existing OICA algorithms rely on the Expectation Maximization (EM) procedure that requires computationally expensive inference of the posterior distribution of independent components. To tackle these problems, we present a Likelihood-Free Overcomplete ICA algorithm (LFOICA) that estimates the mixing matrix directly by back-propagation without any explicit assumptions on the density function of independent components. Thanks to its computational efficiency, the proposed method makes a number of causal discovery procedures much more practically feasible. For illustrative purposes, we demonstrate the computational efficiency and efficacy of our method in two causal discovery tasks on both synthetic and real data.

* 10 pages, 3 figures. Accepted by NeurIPS 2019 as spotlight 

  Click for Model/Code and Paper
Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models

May 26, 2019
Biwei Huang, Kun Zhang, Mingming Gong, Clark Glymour

In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.


  Click for Model/Code and Paper
Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

Apr 03, 2019
Shanshan Zhao, Huan Fu, Mingming Gong, Dacheng Tao

Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures. Since the groundtruth depth labels are hard to obtain, recent methods try to learn depth estimation networks in an unsupervised way by exploring unsupervised cues, which are effective but less reliable than true labels. An emerging way to resolve this dilemma is to transfer knowledge from synthetic images with ground truth depth via domain adaptation techniques. However, these approaches overlook specific geometric structure of the natural images in the target domain (i.e., real data), which is important for high-performing depth prediction. Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. Moreover, by training two image style translators and depth estimators symmetrically in an end-to-end network, our model achieves better image style transfer and generates high-quality depth maps. The experimental results demonstrate the effectiveness of our proposed method and comparable performance against the state-of-the-art. Code will be publicly available at: https://github.com/sshan-zhao/GASDA.

* CVPR 2019 

  Click for Model/Code and Paper
Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) for Vehicle Condition Prediction

Mar 12, 2019
Mingming Lu, Kunfang Zhang, Haiying Liu, Naixue Xiong

The prediction of urban vehicle flow and speed can greatly facilitate people's travel, and also can provide reasonable advice for the decision-making of relevant government departments. However, due to the spatial, temporal and hierarchy of vehicle flow and many influencing factors such as weather, it is difficult to prediction. Most of the existing research methods are to extract spatial structure information on the road network and extract time series information from the historical data. However, when extracting spatial features, these methods have higher time and space complexity, and incorporate a lot of noise. It is difficult to apply on large graphs, and only considers the influence of surrounding connected road nodes on the central node, ignoring a very important hierarchical relationship, namely, similar information of similar node features and road network structures. In response to these problems, this paper proposes the Graph Hierarchical Convolutional Recurrent Neural Network (GHCRNN) model. The model uses GCN (Graph Convolutional Networks) to extract spatial feature, GRU (Gated Recurrent Units) to extract temporal feature, and uses the learnable Pooling to extract hierarchical information, eliminate redundant information and reduce complexity. Applying this model to the vehicle flow and speed data of Shenzhen and Los Angeles has been well verified, and the time and memory consumption are effectively reduced under the compared precision.


  Click for Model/Code and Paper
Learning with Biased Complementary Labels

Aug 08, 2018
Xiyu Yu, Tongliang Liu, Mingming Gong, Dacheng Tao

In this paper, we study the classification problem in which we have access to easily obtainable surrogate for true labels, namely complementary labels, which specify classes that observations do \textbf{not} belong to. Let $Y$ and $\bar{Y}$ be the true and complementary labels, respectively. We first model the annotation of complementary labels via transition probabilities $P(\bar{Y}=i|Y=j), i\neq j\in\{1,\cdots,c\}$, where $c$ is the number of classes. Previous methods implicitly assume that $P(\bar{Y}=i|Y=j), \forall i\neq j$, are identical, which is not true in practice because humans are biased toward their own experience. For example, as shown in Figure 1, if an annotator is more familiar with monkeys than prairie dogs when providing complementary labels for meerkats, she is more likely to employ "monkey" as a complementary label. We therefore reason that the transition probabilities will be different. In this paper, we propose a framework that contributes three main innovations to learning with \textbf{biased} complementary labels: (1) It estimates transition probabilities with no bias. (2) It provides a general method to modify traditional loss functions and extends standard deep neural network classifiers to learn with biased complementary labels. (3) It theoretically ensures that the classifier learned with complementary labels converges to the optimal one learned with true labels. Comprehensive experiments on several benchmark datasets validate the superiority of our method to current state-of-the-art methods.

* ECCV 2018 Oral 

  Click for Model/Code and Paper
MoE-SPNet: A Mixture-of-Experts Scene Parsing Network

Jun 19, 2018
Huan Fu, Mingming Gong, Chaohui Wang, Dacheng Tao

Scene parsing is an indispensable component in understanding the semantics within a scene. Traditional methods rely on handcrafted local features and probabilistic graphical models to incorporate local and global cues. Recently, methods based on fully convolutional neural networks have achieved new records on scene parsing. An important strategy common to these methods is the aggregation of hierarchical features yielded by a deep convolutional neural network. However, typical algorithms usually aggregate hierarchical convolutional features via concatenation or linear combination, which cannot sufficiently exploit the diversities of contextual information in multi-scale features and the spatial inhomogeneity of a scene. In this paper, we propose a mixture-of-experts scene parsing network (MoE-SPNet) that incorporates a convolutional mixture-of-experts layer to assess the importance of features from different levels and at different spatial locations. In addition, we propose a variant of mixture-of-experts called the adaptive hierarchical feature aggregation (AHFA) mechanism which can be incorporated into existing scene parsing networks that use skip-connections to fuse features layer-wisely. In the proposed networks, different levels of features at each spatial location are adaptively re-weighted according to the local structure and surrounding contextual information before aggregation. We demonstrate the effectiveness of the proposed methods on two scene parsing datasets including PASCAL VOC 2012 and SceneParse150 based on two kinds of baseline models FCN-8s and DeepLab-ASPP.


  Click for Model/Code and Paper
A Compromise Principle in Deep Monocular Depth Estimation

Jun 12, 2018
Huan Fu, Mingming Gong, Chaohui Wang, Dacheng Tao

Monocular depth estimation, which plays a key role in understanding 3D scene geometry, is fundamentally an ill-posed problem. Existing methods based on deep convolutional neural networks (DCNNs) have examined this problem by learning convolutional networks to estimate continuous depth maps from monocular images. However, we find that training a network to predict a high spatial resolution continuous depth map often suffers from poor local solutions. In this paper, we hypothesize that achieving a compromise between spatial and depth resolutions can improve network training. Based on this "compromise principle", we propose a regression-classification cascaded network (RCCN), which consists of a regression branch predicting a low spatial resolution continuous depth map and a classification branch predicting a high spatial resolution discrete depth map. The two branches form a cascaded structure allowing the classification and regression branches to benefit from each other. By leveraging large-scale raw training datasets and some data augmentation strategies, our network achieves top or state-of-the-art results on the NYU Depth V2, KITTI, and Make3D benchmarks.


  Click for Model/Code and Paper
Neural Color Transfer between Images

Oct 02, 2017
Mingming He, Jing Liao, Lu Yuan, Pedro V. Sander

We propose a new algorithm for color transfer between images that have perceptually similar semantic structures. We aim to achieve a more accurate color transfer that leverages semantically-meaningful dense correspondence between images. To accomplish this, our algorithm uses neural representations for matching. Additionally, the color transfer should be spatially-variant and globally coherent. Therefore, our algorithm optimizes a local linear model for color transfer satisfying both local and global constraints. Our proposed approach jointly optimize matching and color transfer, adopting a coarse-to-fine strategy. The proposed method can be successfully extended from "one-to-one" to "one-to-many" color transfers. The latter further addresses the problem of mismatching elements of the input image. We validate our proposed method by testing it on a large variety of image content.


  Click for Model/Code and Paper
Explosion prediction of oil gas using SVM and Logistic Regression

Nov 08, 2012
Xiaofei Wang, Mingming Zhang, Liyong Shen, Suixiang Gao

The prevention of dangerous chemical accidents is a primary problem of industrial manufacturing. In the accidents of dangerous chemicals, the oil gas explosion plays an important role. The essential task of the explosion prevention is to estimate the better explosion limit of a given oil gas. In this paper, Support Vector Machines (SVM) and Logistic Regression (LR) are used to predict the explosion of oil gas. LR can get the explicit probability formula of explosion, and the explosive range of the concentrations of oil gas according to the concentration of oxygen. Meanwhile, SVM gives higher accuracy of prediction. Furthermore, considering the practical requirements, the effects of penalty parameter on the distribution of two types of errors are discussed.

* 14pages,7 figures, 7 tables 

  Click for Model/Code and Paper
Twin Auxiliary Classifiers GAN

Jul 30, 2019
Mingming Gong, Yanwu Xu, Chunyuan Li, Kun Zhang, Kayhan Batmanghelich

Conditional generative models enjoy remarkable progress over the past few years. One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN), which generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier. However, the diversity of the generated samples by AC-GAN tends to decrease as the number of classes increases, hence limiting its power on large-scale data. In this paper, we identify the source of the low diversity issue theoretically and propose a practical solution to solve the problem. We show that the auxiliary classifier in AC-GAN imposes perfect separability, which is disadvantageous when the supports of the class distributions have significant overlap. To address the issue, we propose Twin Auxiliary Classifiers Generative Adversarial Net (TAC-GAN) that further benefits from a new player that interacts with other players (the generator and the discriminator) in GAN. Theoretically, we demonstrate that TAC-GAN can effectively minimize the divergence between the generated and real-data distributions. Extensive experimental results show that our TAC-GAN can successfully replicate the true data distributions on simulated data, and significantly improves the diversity of class-conditional image generation on real datasets.


  Click for Model/Code and Paper
Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

Mar 09, 2019
Mingming Lu, Dingwu Tan, Naixue Xiong, Zailiang Chen, Haifeng Li

The online programing services, such as Github,TopCoder, and EduCoder, have promoted a lot of social interactions among the service users. However, the existing social interactions is rather limited and inefficient due to the rapid increasing of source-code repositories, which is difficult to explore manually. The emergence of source-code mining provides a promising way to analyze those source codes, so that those source codes can be relatively easy to understand and share among those service users. Among all the source-code mining attempts,program classification lays a foundation for various tasks related to source-code understanding, because it is impossible for a machine to understand a computer program if it cannot classify the program correctly. Although numerous machine learning models, such as the Natural Language Processing (NLP) based models and the Abstract Syntax Tree (AST) based models, have been proposed to classify computer programs based on their corresponding source codes, the existing works cannot fully characterize the source codes from the perspective of both the syntax and semantic information. To address this problem, we proposed a Graph Neural Network (GNN) based model, which integrates data flow and function call information to the AST,and applies an improved GNN model to the integrated graph, so as to achieve the state-of-art program classification accuracy. The experiment results have shown that the proposed work can classify programs with accuracy over 97%.

* 12 pages, 27 figures 

  Click for Model/Code and Paper