Models, code, and papers for "Xiang Li":

Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

Apr 27, 2019
Xiang He, Sibei Yang, Guanbin Li?, Haofeng Li, Huiyou Chang, Yizhou Yu

Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNN-based state-of-the-art biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safety-critical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, non-local context encoder (NLCE) is proposed to model short- and long range spatial dependencies and encode global contexts for strengthening feature activations by channel-wise attention. The NLCE modules enhance the robustness and accuracy of the non-local context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other state-of-the-art biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNN-based biomedical image segmentation methods.

* Accepted by AAAI2019 as oral presentation 

  Click for Model/Code and Paper
Neural Image Compression and Explanation

Aug 09, 2019
Xiang Li, Shihao Ji

Explaining the prediction of deep neural networks (DNNs) and semantic image compression are two active research areas of deep learning with a numerous of applications in decision-critical systems, such as surveillance cameras, drones and self-driving cars, where interpretable decision is critical and storage/network bandwidth is limited. In this paper, we propose a novel end-to-end Neural Image Compression and Explanation (NICE) framework that learns to (1) explain the prediction of convolutional neural networks (CNNs), and (2) subsequently compress the input images for efficient storage or transmission. Specifically, NICE generates a sparse mask over an input image by attaching a stochastic binary gate to each pixel of the image, whose parameters are learned through the interaction with the CNN classifier to be explained. The generated mask is able to capture the saliency of each pixel measured by its influence to the final prediction of CNN; it can also be used to produce a mixed-resolution image, where important pixels maintain their original high resolution and insignificant background pixels are subsampled to a low resolution. The produced images achieve a high compression rate (e.g., about 0.6x of original image file size), while retaining a similar classification accuracy. Extensive experiments across multiple image classification benchmarks demonstrate the superior performance of NICE compared to the state-of-the-art methods in terms of explanation quality and image compression rate.

  Click for Model/Code and Paper
Disentangling Style and Content in Anime Illustrations

May 28, 2019
Sitao Xiang, Hao Li

Existing methods for AI-generated artworks still struggle with generating high-quality stylized content, where high-level semantics are preserved, or separating fine-grained styles from various artists. We propose a novel Generative Adversarial Disentanglement Network which can fully decompose complex anime illustrations into style and content. Training such model is challenging, since given a style, various content data may exist but not the other way round. In particular, we disentangle two complementary factors of variations, where one of the factors is labelled. Our approach is divided into two stages, one that encodes an input image into a style independent content, and one based on a dual-conditional generator. We demonstrate the ability to generate high-fidelity anime portraits with a fixed content and a large variety of styles from over a thousand artists, and vice versa, using a single end-to-end network and with applications in style transfer. We show this unique capability as well as superior output to the current state-of-the-art.

* Submitted to NeurIPS 2019 

  Click for Model/Code and Paper
Defense-VAE: A Fast and Accurate Defense against Adversarial Attacks

Dec 17, 2018
Xiang Li, Shihao Ji

Deep neural networks (DNNs) have been enormously successful across a variety of prediction tasks. However, recent research shows that DNNs are particularly vulnerable to adversarial attacks, which poses a serous threat to their applications in security-sensitive systems. In this paper, we propose a simple yet effective defense algorithm Defense-VAE that uses variational autoencoder (VAE) to purge adversarial perturbations from contaminated images. The proposed method is generic and can defend white-box and black-box attacks without the need of retraining the original CNN classifiers, and can further strengthen the defense by retraining CNN or end-to-end finetuning the whole pipeline. In addition, the proposed method is very efficient compared to the optimization-based alternatives, such as Defense-GAN, since no iterative optimization is needed for online prediction. Extensive experiments on MNIST, Fashion-MNIST, CelebA and CIFAR-10 demonstrate the superior defense accuracy of Defense-VAE compared to Defense-GAN, while being 50x faster than the latter. This makes Defense-VAE widely deployable in real-time security-sensitive systems. We plan to open source our implementation to facilitate the research in this area.

  Click for Model/Code and Paper
Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks

May 21, 2018
Sitao Xiang, Hao Li

Deep learning-based style transfer between images has recently become a popular area of research. A common way of encoding "style" is through a feature representation based on the Gram matrix of features extracted by some pre-trained neural network or some other form of feature statistics. Such a definition is based on an arbitrary human decision and may not best capture what a style really is. In trying to gain a better understanding of "style", we propose a metric learning-based method to explicitly encode the style of an artwork. In particular, our definition of style captures the differences between artists, as shown by classification performances, and such that the style representation can be interpreted, manipulated and visualized through style-conditioned image generation through a Generative Adversarial Network. We employ this method to explore the style space of anime portrait illustrations.

  Click for Model/Code and Paper
On the Effects of Batch and Weight Normalization in Generative Adversarial Networks

Dec 04, 2017
Sitao Xiang, Hao Li

Generative adversarial networks (GANs) are highly effective unsupervised learning frameworks that can generate very sharp data, even for data such as images with complex, highly multimodal distributions. However GANs are known to be very hard to train, suffering from problems such as mode collapse and disturbing visual artifacts. Batch normalization (BN) techniques have been introduced to address the training. Though BN accelerates the training in the beginning, our experiments show that the use of BN can be unstable and negatively impact the quality of the trained model. The evaluation of BN and numerous other recent schemes for improving GAN training is hindered by the lack of an effective objective quality measure for GAN models. To address these issues, we first introduce a weight normalization (WN) approach for GAN training that significantly improves the stability, efficiency and the quality of the generated samples. To allow a methodical evaluation, we introduce squared Euclidean reconstruction error on a test set as a new objective measure, to assess training performance in terms of speed, stability, and quality of generated samples. Our experiments with a standard DCGAN architecture on commonly used datasets (CelebA, LSUN bedroom, and CIFAR-10) indicate that training using WN is generally superior to BN for GANs, achieving 10% lower mean squared loss for reconstruction and significantly better qualitative results than BN. We further demonstrate the stability of WN on a 21-layer ResNet trained with the CelebA data set. The code for this paper is available at

* v3 rejected by NIPS 2017, updated and re-submitted to CVPR 2018. v4: add experiments with ResNet and like to new code 

  Click for Model/Code and Paper
Specializing Word Embeddings (for Parsing) by Information Bottleneck

Oct 01, 2019
Xiang Lisa Li, Jason Eisner

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

* Accepted for publication at EMNLP 2019 

  Click for Model/Code and Paper
Invasiveness Prediction of Pulmonary Adenocarcinomas Using Deep Feature Fusion Networks

Sep 21, 2019
Xiang Li, Jiechao Ma, Hongwei Li

Early diagnosis of pathological invasiveness of pulmonary adenocarcinomas using computed tomography (CT) imaging would alter the course of treatment of adenocarcinomas and subsequently improve the prognosis. Most of the existing systems use either conventional radiomics features or deep-learning features alone to predict the invasiveness. In this study, we explore the fusion of the two kinds of features and claim that radiomics features can be complementary to deep-learning features. An effective deep feature fusion network is proposed to exploit the complementarity between the two kinds of features, which improves the invasiveness prediction results. We collected a private dataset that contains lung CT scans of 676 patients categorized into four invasiveness types from a collaborating hospital. Evaluations on this dataset demonstrate the effectiveness of our proposal.

  Click for Model/Code and Paper
Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification

May 07, 2019
Qi Wang, Xiange He, Xuelong Li

Hyperspectral image (HSI) classification, which aims to assign an accurate label for hyperspectral pixels, has drawn great interest in recent years. Although low rank representation (LRR) has been used to classify HSI, its ability to segment each class from the whole HSI data has not been exploited fully yet. LRR has a good capacity to capture the underlying lowdimensional subspaces embedded in original data. However, there are still two drawbacks for LRR. First, LRR does not consider the local geometric structure within data, which makes the local correlation among neighboring data easily ignored. Second, the representation obtained by solving LRR is not discriminative enough to separate different data. In this paper, a novel locality and structure regularized low rank representation (LSLRR) model is proposed for HSI classification. To overcome the above limitations, we present locality constraint criterion (LCC) and structure preserving strategy (SPS) to improve the classical LRR. Specifically, we introduce a new distance metric, which combines both spatial and spectral features, to explore the local similarity of pixels. Thus, the global and local structures of HSI data can be exploited sufficiently. Besides, we propose a structure constraint to make the representation have a near block-diagonal structure. This helps to determine the final classification labels directly. Extensive experiments have been conducted on three popular HSI datasets. And the experimental results demonstrate that the proposed LSLRR outperforms other state-of-the-art methods.

* IEEE Transactions on Geoscience and Remote Sensing ( Volume: 57 , Issue: 2 , Feb. 2019 ) 
* 14 pages, 7 figures, TGRS2019 

  Click for Model/Code and Paper
Negotiation-based Human-Robot Collaboration via Augmented Reality

Sep 24, 2019
Kishan Chandan, Xiang Li, Shiqi Zhang

Effective human-robot collaboration (HRC) requires extensive communication among the human and robot teammates, because their actions can potentially produce conflicts, synergies, or both. In this paper, we develop an augmented reality-driven, negotiation-based (ARN) framework for HRC, where ARN supports planning-phase negotiations within human-robot teams through a novel AR-based interface. We have conducted experiments in an office environment, where multiple mobile robots work on delivery tasks. The robots could not complete the tasks on their own, but sometimes need help from their human teammate, rendering human-robot collaboration necessary. From the experimental results, we observe that ARN significantly reduced the human-robot team's task completion time in comparison to a non-AR baseline.

  Click for Model/Code and Paper
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks

May 25, 2019
Xiang Li, Xiaolin Hu, Jian Yang

The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features. These sub-features can usually be distributed in grouped form in the feature vector of each layer, representing various semantic entities. However, the activation of these sub-features is often spatially affected by similar patterns and noisy backgrounds, resulting in erroneous localization and identification. We propose a Spatial Group-wise Enhance (SGE) module that can adjust the importance of each sub-feature by generating an attention factor for each spatial location in each semantic group, so that every individual group can autonomously enhance its learnt expression and suppress possible noise. The attention factors are only guided by the similarities between the global and local feature descriptors inside each group, thus the design of SGE module is extremely lightweight with \emph{almost no extra parameters and calculations}. Despite being trained with only category supervisions, the SGE component is extremely effective in highlighting multiple active areas with various high-order semantics (such as the dog's eyes, nose, etc.). When integrated with popular CNN backbones, SGE can significantly boost the performance of image recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves 1.2\% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0\% AP gain on the COCO benchmark across a wide range of detectors (Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are available at

* Code available at: 

  Click for Model/Code and Paper
A Unified Framework for Regularized Reinforcement Learning

Mar 02, 2019
Xiang Li, Wenhao Yang, Zhihua Zhang

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term. The extant entropy-regularized MDPs can be cast into our framework. Moreover, under our framework there are many regularization terms can bring multi-modality and sparsity which are potentially useful in reinforcement learning. In particular, we present sufficient and necessary conditions that induce a sparse optimal policy. We also conduct a full mathematical analysis of the proposed regularized MDPs, including the optimality condition, performance error and sparseness control. We provide a generic method to devise regularization forms and propose off-policy actor-critic algorithms in complex environment settings. We empirically analyze the statistical properties of optimal policies and compare the performance of different sparse regularization forms in discrete and continuous environments.

  Click for Model/Code and Paper
Do Subsampled Newton Methods Work for High-Dimensional Data?

Feb 13, 2019
Xiang Li, Shusen Wang, Zhihua Zhang

Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information. However, previous results require $\Omega (d)$ samples to approximate Hessians, where $d$ is the dimension of data points, making it less practically feasible for high-dimensional data. The situation is deteriorated when $d$ is comparably as large as the number of data points $n$, which requires to take the whole dataset into account, making subsampling useless. This paper theoretically justifies the effectiveness of subsampled Newton methods on high dimensional data. Specifically, we prove only $\widetilde{\Theta}(d^\gamma_{\rm eff})$ samples are needed in the approximation of Hessian matrices, where $d^\gamma_{\rm eff}$ is the $\gamma$-ridge leverage and can be much smaller than $d$ as long as $n\gamma \gg 1$. Additionally, we extend this result so that subsampled Newton methods can work for high-dimensional data on both distributed optimization problems and non-smooth regularized problems.

  Click for Model/Code and Paper
Supervised Community Detection with Line Graph Neural Networks

Oct 25, 2018
Zhengdao Chen, Xiang Li, Joan Bruna

We study data-driven methods for community detection on graphs, an inverse problem that is typically solved in terms of the spectrum of certain operators or via posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational signal-to-noise detection thresholds. This graph inference task can be recast as a node-wise graph classification problem, and, as such, computational detection thresholds can be translated in terms of learning within appropriate models. We present a novel family of Graph Neural Networks (GNNs) and show that they can reach those detection thresholds in a purely data-driven manner without access to the underlying generative models, and even improve upon current computational thresholds in hard regimes. For that purpose, we propose to augment GNNs with the non-backtracking operator, defined on the line graph of edge adjacencies. We also perform the first analysis of optimization landscape on using GNNs to solve community detection problems, demonstrating that under certain simplifications and assumptions, the loss value at the local minima is close to the loss value at the global minimum/minima. Finally, the resulting model is also tested on real datasets, performing significantly better than previous models.

  Click for Model/Code and Paper
Triple Attention Mixed Link Network for Single Image Super Resolution

Oct 08, 2018
Xi Cheng, Xiang Li, Jian Yang

Single image super resolution is of great importance as a low-level computer vision task. Recent approaches with deep convolutional neural networks have achieved im-pressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to significantly enhance the feature representation, we proposed Triple Attention mixed link Network (TAN) which consists of 1) three different aspects (i.e., kernel, spatial and channel) of attention mechanisms and 2) fu-sion of both powerful residual and dense connections (i.e., mixed link). Specifically, the network with multi kernel learns multi hierarchical representations under different receptive fields. The output features are recalibrated by the effective kernel and channel attentions and feed into next layer partly residual and partly dense, which filters the information and enable the network to learn more powerful representations. The features finally pass through the spatial attention in the reconstruction network which generates a fusion of local and global information, let the network restore more details and improves the quality of reconstructed images. Thanks to the diverse feature recalibrations and the advanced information flow topology, our proposed model is strong enough to per-form against the state-of-the-art methods on the bench-mark evaluations.

  Click for Model/Code and Paper
On better training the infinite restricted Boltzmann machines

Oct 14, 2017
Xuan Peng, Xunzhang Gao, Xiang Li

The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model's generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.

* Submitted to Machine Learning 

  Click for Model/Code and Paper
Improved Representation Learning for Predicting Commonsense Ontologies

Aug 01, 2017
Xiang Li, Luke Vilnis, Andrew McCallum

Recent work in learning ontologies (hierarchical and partially-ordered structures) has leveraged the intrinsic geometry of spaces of learned representations to make predictions that automatically obey complex structural constraints. We explore two extensions of one such model, the order-embedding model for hierarchical relation learning, with an aim towards improved performance on text data for commonsense knowledge representation. Our first model jointly learns ordering relations and non-hierarchical knowledge in the form of raw text. Our second extension exploits the partial order structure of the training data to find long-distance triplet constraints among embeddings which are poorly enforced by the pairwise training procedure. We find that both incorporating free text and augmented training constraints improve over the original order-embedding model and other strong baselines.

* 4 pages, ICML 2017 DeepStruct Workshop 

  Click for Model/Code and Paper
Learning a Deep Embedding Model for Zero-Shot Learning

Apr 12, 2017
Li Zhang, Tao Xiang, Shaogang Gong

Zero-shot learning (ZSL) models rely on learning a joint embedding space where both textual/semantic description of object classes and visual representation of object images can be projected to for nearest neighbour search. Despite the success of deep neural networks that learn an end-to-end model between text and images in other vision problems such as image captioning, very few deep ZSL model exists and they show little advantage over ZSL models that utilise deep feature representations but do not learn an end-to-end embedding. In this paper we argue that the key to make deep ZSL models succeed is to choose the right embedding space. Instead of embedding into a semantic space or an intermediate space, we propose to use the visual space as the embedding space. This is because that in this space, the subsequent nearest neighbour search would suffer much less from the hubness problem and thus become more effective. This model design also provides a natural mechanism for multiple semantic modalities (e.g., attributes and sentence descriptions) to be fused and optimised jointly in an end-to-end manner. Extensive experiments on four benchmarks show that our model significantly outperforms the existing models.

* To appear in CVPR 2017 

  Click for Model/Code and Paper
Learning a Discriminative Null Space for Person Re-identification

Mar 07, 2016
Li Zhang, Tao Xiang, Shaogang Gong

Most existing person re-identification (re-id) methods focus on learning the optimal distance metrics across camera views. Typically a person's appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training images. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power. In this work, we propose to overcome the SSS problem in re-id distance metric learning by matching people in a discriminative null space of the training data. In this null space, images of the same person are collapsed into a single point thus minimising the within-class scatter to the extreme and maximising the relative between-class separation simultaneously. Importantly, it has a fixed dimension, a closed-form solution and is very efficient to compute. Extensive experiments carried out on five person re-identification benchmarks including VIPeR, PRID2011, CUHK01, CUHK03 and Market1501 show that such a simple approach beats the state-of-the-art alternatives, often by a big margin.

* accepted by CVPR2016 

  Click for Model/Code and Paper
Weighted Laplacian and Its Theoretical Applications

Nov 23, 2019
Shijie Xu, Jiayan Fang, Xiang-Yang Li

In this paper, we develop a novel weighted Laplacian method, which is partially inspired by the theory of graph Laplacian, to study recent popular graph problems, such as multilevel graph partitioning and balanced minimum cut problem, in a more convenient manner. Since the weighted Laplacian strategy inherits the virtues of spectral methods, graph algorithms designed using weighted Laplacian will necessarily possess more robust theoretical guarantees for algorithmic performances, comparing with those existing algorithms that are heuristically proposed. In order to illustrate its powerful utility both in theory and in practice, we also present two effective applications of our weighted Laplacian method to multilevel graph partitioning and balanced minimum cut problem, respectively. By means of variational methods and theory of partial differential equations (PDEs), we have established the equivalence relations among the weighted cut problem, balanced minimum cut problem and the initial clustering problem that arises in the middle stage of graph partitioning algorithms under a multilevel structure. These equivalence relations can indeed provide solid theoretical support for algorithms based on our proposed weighted Laplacian strategy. Moreover, from the perspective of the application to the balanced minimum cut problem, weighted Laplacian can make it possible for research of numerical solutions of PDEs to be a powerful tool for the algorithmic study of graph problems. Experimental results also indicate that the algorithm embedded with our strategy indeed outperforms other existing graph algorithms, especially in terms of accuracy, thus verifying the efficacy of the proposed weighted Laplacian.

  Click for Model/Code and Paper