Models, code, and papers for "Jing Huo":

Cross-Domain Adversarial Auto-Encoder

Apr 17, 2018
Haodi Hou, Jing Huo, Yang Gao

In this paper, we propose the Cross-Domain Adversarial Auto-Encoder (CDAAE) to address the problem of cross-domain image inference, generation and transformation. We make the assumption that images from different domains share the same latent code space for content, while having separate latent code space for style. The proposed framework can map cross-domain data to a latent code vector consisting of a content part and a style part. The latent code vector is matched with a prior distribution so that we can generate meaningful samples from any part of the prior space. Consequently, given a sample of one domain, our framework can generate various samples of the other domain with the same content of the input. This makes the proposed framework different from the current work of cross-domain transformation. Besides, the proposed framework can be trained with both labeled and unlabeled data, which makes it also suitable for domain adaptation. Experimental results on data sets SVHN, MNIST and CASIA show the proposed framework achieved visually appealing performance for image generation task. Besides, we also demonstrate the proposed method achieved superior results for domain adaptation. Code of our experiments is available in

* Under review as a conference paper of KDD 2018 

  Access Model/Code and Paper
Progressive Cross-camera Soft-label Learning for Semi-supervised Person Re-identification

Aug 15, 2019
Lei Qi, Lei Wang, Jing Huo, Yinghuan Shi, Yang Gao

In this paper, we focus on the semi-supervised person re-identification (Re-ID) case, which only has the intra-camera (within-camera) labels but not inter-camera (cross-camera) labels. In real-world applications, these intra-camera labels can be readily captured by tracking algorithms or few manual annotations, when compared with cross-camera labels. In this case, it is very difficult to explore the relationships between cross-camera persons in the training stage due to the lack of cross-camera label information. To deal with this issue, we propose a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semi-supervised person Re-ID task, which can generate cross-camera soft-labels and utilize them to optimize the network. Concretely, we calculate an affinity matrix based on person-level features and adapt them to produce the similarities between cross-camera persons (i.e., cross-camera soft-labels). To exploit these soft-labels to train the network, we investigate the weighted cross-entropy loss and the weighted triplet loss from the classification and discrimination perspectives, respectively. Particularly, the proposed framework alternately generates progressive cross-camera soft-labels and gradually improves feature representations in the whole learning course. Extensive experiments on five large-scale benchmark datasets show that PCSL significantly outperforms the state-of-the-art unsupervised methods that employ labeled source domains or the images generated by the GAN-based models. Furthermore, the proposed method even has a competitive performance with respect to deep supervised Re-ID methods.

* arXiv admin note: text overlap with arXiv:1908.00862 

  Access Model/Code and Paper
GreyReID: A Two-stream Deep Framework with RGB-grey Information for Person Re-identification

Aug 14, 2019
Lei Qi, Lei Wang, Jing Huo, Yinghuan Shi, Yang Gao

In this paper, we observe that most false positive images (i.e., different identities with query images) in the top ranking list usually have the similar color information with the query image in person re-identification (Re-ID). Meanwhile, when we use the greyscale images generated from RGB images to conduct the person Re-ID task, some hard query images can obtain better performance compared with using RGB images. Therefore, RGB and greyscale images seem to be complementary to each other for person Re-ID. In this paper, we aim to utilize both RGB and greyscale images to improve the person Re-ID performance. To this end, we propose a novel two-stream deep neural network with RGB-grey information, which can effectively fuse RGB and greyscale feature representations to enhance the generalization ability of Re-ID. Firstly, we convert RGB images to greyscale images in each training batch. Based on these RGB and greyscale images, we train the RGB and greyscale branches, respectively. Secondly, to build up connections between RGB and greyscale branches, we merge the RGB and greyscale branches into a new joint branch. Finally, we concatenate the features of all three branches as the final feature representation for Re-ID. Moreover, in the training process, we adopt the joint learning scheme to simultaneously train each branch by the independent loss function, which can enhance the generalization ability of each branch. Besides, a global loss function is utilized to further fine-tune the final concatenated feature. The extensive experiments on multiple benchmark datasets fully show that the proposed method can outperform the state-of-the-art person Re-ID methods. Furthermore, using greyscale images can indeed improve the person Re-ID performance.

  Access Model/Code and Paper
Adversarial Camera Alignment Network for Unsupervised Cross-camera Person Re-identification

Aug 02, 2019
Lei Qi, Lei Wang, Jing Huo, Yinghuan Shi, Yang Gao

In person re-identification (Re-ID), supervised methods usually need a large amount of expensive label information, while unsupervised ones are still unable to deliver satisfactory identification performance. In this paper, we introduce a novel person Re-ID task called unsupervised cross-camera person Re-ID, which only needs the within-camera (intra-camera) label information but not cross-camera (inter-camera) labels which are more expensive to obtain. In real-world applications, the intra-camera label information can be easily captured by tracking algorithms or few manual annotations. In this situation, the main challenge becomes the distribution discrepancy across different camera views, caused by the various body pose, occlusion, image resolution, illumination conditions, and background noises in different cameras. To address this situation, we propose a novel Adversarial Camera Alignment Network (ACAN) for unsupervised cross-camera person Re-ID. It consists of the camera-alignment task and the supervised within-camera learning task. To achieve the camera alignment, we develop a Multi-Camera Adversarial Learning (MCAL) to map images of different cameras into a shared subspace. Particularly, we investigate two different schemes, including the existing GRL (i.e., gradient reversal layer) scheme and the proposed scheme called "other camera equiprobability" (OCE), to conduct the multi-camera adversarial task. Based on this shared subspace, we then leverage the within-camera labels to train the network. Extensive experiments on five large-scale datasets demonstrate the superiority of ACAN over multiple state-of-the-art unsupervised methods that take advantage of labeled source domains and generated images by GAN-based models. In particular, we verify that the proposed multi-camera adversarial task does contribute to the significant improvement.

  Access Model/Code and Paper
MaskReID: A Mask Based Deep Ranking Neural Network for Person Re-identification

Apr 11, 2018
Lei Qi, Jing Huo, Lei Wang, Yinghuan Shi, Yang Gao

In this paper, a novel mask based deep ranking neural network with skipped fusing layer (MaskReID) is proposed for person re-identification (Re-ID). For person Re-ID, there are multiple challenges co-exist throughout the re-identification process, including cluttered background, appearance variations (illumination, pose, occlusion, etc.) among different camera views and interference of samples of similar appearance. A compact framework is proposed to address these problems. Firstly, to address the problem of cluttered background, masked images which are the image segmentations of the original images are incorporated as input in the proposed neural network. Then, to remove the appearance variations so as to obtain more discriminative feature, a new network structure is proposed which fuses feature of different layers as the final feature. This makes the final feature a combination of all the low, middle and high level feature, which is more informative. Lastly, as person Re-ID is a special image retrieval task, a novel ranking loss is designed to optimize the whole network. The ranking loss relieved the interference problem of similar samples while producing ranking results. The experimental results demonstrate that the proposed method consistently outperforms the state-of-the-art methods on many person Re-ID datasets, especially large-scale datasets, such as, CUHK03, Market1501 and DukeMTMC-reID.

  Access Model/Code and Paper
MW-GAN: Multi-Warping GAN for Caricature Generation with Multi-Style Geometric Exaggeration

Jan 07, 2020
Haodi Hou, Jing Huo, Jing Wu, Yu-Kun Lai, Yang Gao

Given an input face photo, the goal of caricature generation is to produce stylized, exaggerated caricatures that share the same identity as the photo. It requires simultaneous style transfer and shape exaggeration with rich diversity, and meanwhile preserving the identity of the input. To address this challenging problem, we propose a novel framework called Multi-Warping GAN (MW-GAN), including a style network and a geometric network that are designed to conduct style transfer and geometric exaggeration respectively. We bridge the gap between the style and landmarks of an image with corresponding latent code spaces by a dual way design, so as to generate caricatures with arbitrary styles and geometric exaggeration, which can be specified either through random sampling of latent code or from a given caricature sample. Besides, we apply identity preserving loss to both image space and landmark space, leading to a great improvement in quality of generated caricatures. Experiments show that caricatures generated by MW-GAN have better quality than existing methods.

  Access Model/Code and Paper
DAGCN: Dual Attention Graph Convolutional Networks

Apr 04, 2019
Fengwen Chen, Shirui Pan, Jing Jiang, Huan Huo, Guodong Long

Graph convolutional networks (GCNs) have recently become one of the most powerful tools for graph analytics tasks in numerous applications, ranging from social networks and natural language processing to bioinformatics and chemoinformatics, thanks to their ability to capture the complex relationships between concepts. At present, the vast majority of GCNs use a neighborhood aggregation framework to learn a continuous and compact vector, then performing a pooling operation to generalize graph embedding for the classification task. These approaches have two disadvantages in the graph classification task: (1)when only the largest sub-graph structure ($k$-hop neighbor) is used for neighborhood aggregation, a large amount of early-stage information is lost during the graph convolution step; (2) simple average/sum pooling or max pooling utilized, which loses the characteristics of each node and the topology between nodes. In this paper, we propose a novel framework called, dual attention graph convolutional networks (DAGCN) to address these problems. DAGCN automatically learns the importance of neighbors at different hops using a novel attention graph convolution layer, and then employs a second attention component, a self-attention pooling layer, to generalize the graph representation from the various aspects of a matrix graph embedding. The dual attention network is trained in an end-to-end manner for the graph classification task. We compare our model with state-of-the-art graph kernels and other deep learning methods. The experimental results show that our framework not only outperforms other baselines but also achieves a better rate of convergence.

  Access Model/Code and Paper
LGAN: Lung Segmentation in CT Scans Using Generative Adversarial Network

Jan 11, 2019
Jiaxing Tan, Longlong Jing, Yumei Huo, Yingli Tian, Oguz Akin

Lung segmentation in computerized tomography (CT) images is an important procedure in various lung disease diagnosis. Most of the current lung segmentation approaches are performed through a series of procedures with manually empirical parameter adjustments in each step. Pursuing an automatic segmentation method with fewer steps, in this paper, we propose a novel deep learning Generative Adversarial Network (GAN) based lung segmentation schema, which we denote as LGAN. Our proposed schema can be generalized to different kinds of neural networks for lung segmentation in CT images and is evaluated on a dataset containing 220 individual CT scans with two metrics: segmentation quality and shape similarity. Also, we compared our work with current state of the art methods. The results obtained with this study demonstrate that the proposed LGAN schema can be used as a promising tool for automatic lung segmentation due to its simplified procedure as well as its good performance.

  Access Model/Code and Paper
WebCaricature: a benchmark for caricature recognition

Aug 09, 2018
Jing Huo, Wenbin Li, Yinghuan Shi, Yang Gao, Hujun Yin

Studying caricature recognition is fundamentally important to understanding of face perception. However, little research has been conducted in the computer vision community, largely due to the shortage of suitable datasets. In this paper, a new caricature dataset is built, with the objective to facilitate research in caricature recognition. All the caricatures and face images were collected from the Web. Compared with two existing datasets, this dataset is much more challenging, with a much greater number of available images, artistic styles and larger intra-personal variations. Evaluation protocols are also offered together with their baseline performances on the dataset to allow fair comparisons. Besides, a framework for caricature face recognition is presented to make a thorough analyze of the challenges of caricature recognition. By analyzing the challenges, the goal is to show problems that worth to be further investigated. Additionally, based on the evaluation protocols and the framework, baseline performances of various state-of-the-art algorithms are provided. A conclusion is that there is still a large space for performance improvement and the analyzed problems still need further investigation.

* 12 pages 

  Access Model/Code and Paper
A Novel Unsupervised Camera-aware Domain Adaptation Framework for Person Re-identification

Apr 06, 2019
Lei Qi, Lei Wang, Jing Huo, Luping Zhou, Yinghuan Shi, Yang Gao

Unsupervised cross-domain person re-identification (Re-ID) faces two key issues. One is the data distribution discrepancy between source and target domains, and the other is the lack of labelling information in target domain. They are addressed in this paper from the perspective of representation learning. For the first issue, we highlight the presence of camera-level sub-domains as a unique characteristic of person Re-ID, and develop camera-aware domain adaptation to reduce the discrepancy not only between source and target domains but also across these sub-domains. For the second issue, we exploit the temporal continuity in each camera of target domain to create discriminative information. This is implemented by dynamically generating online triplets within each batch, in order to maximally take advantage of the steadily improved feature representation in training process. Together, the above two methods give rise to a novel unsupervised deep domain adaptation framework for person Re-ID. Experiments and ablation studies on benchmark datasets demonstrate its superiority and interesting properties.

  Access Model/Code and Paper
Asymmetric Distribution Measure for Few-shot Learning

Feb 01, 2020
Wenbin Li, Lei Wang, Jing Huo, Yinghuan Shi, Yang Gao, Jiebo Luo

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relations between query images and support classes. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of queries and classes. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, we achieve $3.02\%$ and $1.56\%$ gains over the state-of-the-art method on the $5$-way $1$-shot task, respectively, validating our innovative design of asymmetric distribution measures for few-shot learning.

* 7 pages 

  Access Model/Code and Paper
Defensive Few-shot Adversarial Learning

Nov 16, 2019
Wenbin Li, Lei Wang, Xingxing Zhang, Jing Huo, Yang Gao, Jiebo Luo

The robustness of deep learning models against adversarial attacks has received increasing attention in recent years. However, both deep learning and adversarial training rely on the availability of a large amount of labeled data and usually do not generalize well to new, unseen classes when only a few training samples are accessible. To address this problem, we explicitly introduce a new challenging problem -- how to learn a robust deep model with limited training samples per class, called defensive few-shot learning in this paper. Simply employing the existing adversarial training techniques in the literature cannot solve this problem. This is because few-shot learning needs to learn transferable knowledge from disjoint auxiliary data, and thus it is invalid to assume the sample-level distribution consistency between the training and test sets as commonly assumed in existing adversarial training techniques. In this paper, instead of assuming such a distribution consistency, we propose to make this assumption at a task-level in the episodic training paradigm in order to better transfer the defense knowledge. Furthermore, inside each task, we design a task-conditioned distribution constraint to narrow the distribution gap between clean and adversarial examples at a sample-level. These give rise to a novel mechanism called multi-level distribution based adversarial training (MDAT) for learning transferable adversarial defense. In addition, a unified $\mathcal{F}_{\beta}$ score is introduced to evaluate different defense methods under the same principle. Extensive experiments demonstrate that MDAT achieves higher effectiveness and robustness over existing alternatives in the few-shot case.

* 10 pages 

  Access Model/Code and Paper
Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning

Apr 10, 2019
Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, Jiebo Luo

Few-shot learning in image classification aims to learn a classifier to classify images when only few training examples are available for each class. Recent work has achieved promising classification performance, where an image-level feature based measure is usually used. In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning. Instead, we think a local descriptor based image-to-class measure should be taken, inspired by its surprising success in the heydays of local invariant features. Specifically, building upon the recent episodic training mechanism, we propose a Deep Nearest Neighbor Neural Network (DN4 in short) and train it in an end-to-end manner. Its key difference from the literature is the replacement of the image-level feature based measure in the final layer by a local descriptor based image-to-class measure. This measure is conducted online via a $k$-nearest neighbor search over the deep local descriptors of convolutional feature maps. The proposed DN4 not only learns the optimal deep local descriptors for the image-to-class measure, but also utilizes the higher efficiency of such a measure in the case of example scarcity, thanks to the exchangeability of visual patterns across the images in the same class. Our work leads to a simple, effective, and computationally efficient framework for few-shot learning. Experimental study on benchmark datasets consistently shows its superiority over the related state-of-the-art, with the largest absolute improvement of $17\%$ over the next best. The source code can be available from \UrlFont{}.

* accepted by CVPR 2019. The code link: 

  Access Model/Code and Paper
CariGAN: Caricature Generation through Weakly Paired Adversarial Learning

Nov 01, 2018
Wenbin Li, Wei Xiong, Haofu Liao, Jing Huo, Yang Gao, Jiebo Luo

Caricature generation is an interesting yet challenging task. The primary goal is to generate plausible caricatures with reasonable exaggerations given face images. Conventional caricature generation approaches mainly use low-level geometric transformations such as image warping to generate exaggerated images, which lack richness and diversity in terms of content and style. The recent progress in generative adversarial networks (GANs) makes it possible to learn an image-to-image transformation from data, so that richer contents and styles can be generated. However, directly applying the GAN-based models to this task leads to unsatisfactory results because there is a large variance in the caricature distribution. Moreover, some models require strictly paired training data which largely limits their usage scenarios. In this paper, we propose CariGAN overcome these problems. Instead of training on paired data, CariGAN learns transformations only from weakly paired images. Specifically, to enforce reasonable exaggeration and facial deformation, facial landmarks are adopted as an additional condition to constrain the generated image. Furthermore, an attention mechanism is introduced to encourage our model to focus on the key facial parts so that more vivid details in these regions can be generated. Finally, a Diversity Loss is proposed to encourage the model to produce diverse results to help alleviate the `mode collapse' problem of the conventional GAN-based models. Extensive experiments on a new large-scale `WebCaricature' dataset show that the proposed CariGAN can generate more plausible caricatures with larger diversity compared with the state-of-the-art models.

* 12 

  Access Model/Code and Paper
Online Deep Metric Learning

May 15, 2018
Wenbin Li, Jing Huo, Yinghuan Shi, Yang Gao, Lei Wang, Jiebo Luo

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

* 9 pages 

  Access Model/Code and Paper
OPML: A One-Pass Closed-Form Solution for Online Metric Learning

Sep 29, 2016
Wenbin Li, Yang Gao, Lei Wang, Luping Zhou, Jing Huo, Yinghuan Shi

To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e., $O(d)$) and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.

* 12 pages 

  Access Model/Code and Paper
Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction

Nov 27, 2019
Yang Li, Guodong Long, Tao Shen, Tianyi Zhou, Lina Yao, Huan Huo, Jing Jiang

Distantly supervised relation extraction intrinsically suffers from noisy labels due to the strong assumption of distant supervision. Most prior works adopt a selective attention mechanism over sentences in a bag to denoise from wrongly labeled data, which however could be incompetent when there is only one sentence in a bag. In this paper, we propose a brand-new light-weight neural framework to address the distantly supervised relation extraction problem and alleviate the defects in previous selective attention framework. Specifically, in the proposed framework, 1) we use an entity-aware word embedding method to integrate both relative position information and head/tail entity embeddings, aiming to highlight the essence of entities for this task; 2) we develop a self-attention mechanism to capture the rich contextual dependencies as a complement for local dependencies captured by piecewise CNN; and 3) instead of using selective attention, we design a pooling-equipped gate, which is based on rich contextual representations, as an aggregator to generate bag-level representation for final relation classification. Compared to selective attention, one major advantage of the proposed gating mechanism is that, it performs stably and promisingly even if only one sentence appears in a bag and thus keeps the consistency across all training examples. The experiments on NYT dataset demonstrate that our approach achieves a new state-of-the-art performance in terms of both AUC and top-n precision metrics.

* Accepted to appear at AAAI 2020 

  Access Model/Code and Paper
Lesion Harvester: Iteratively Mining Unlabeled Lesions and Hard-Negative Examples at Scale

Jan 28, 2020
Jinzheng Cai, Adam P. Harrison, Youjing Zheng, Ke Yan, Yuankai Huo, Jing Xiao, Lin Yang, Le Lu

Acquiring large-scale medical image data, necessary for training machine learning algorithms, is frequently intractable, due to prohibitive expert-driven annotation costs. Recent datasets extracted from hospital archives, e.g., DeepLesion, have begun to address this problem. However, these are often incompletely or noisily labeled, e.g., DeepLesion leaves over 50% of its lesions unlabeled. Thus, effective methods to harvest missing annotations are critical for continued progress in medical image analysis. This is the goal of our work, where we develop a powerful system to harvest missing lesions from the DeepLesion dataset at high precision. Accepting the need for some degree of expert labor to achieve high fidelity, we exploit a small fully-labeled subset of medical image volumes and use it to intelligently mine annotations from the remainder. To do this, we chain together a highly sensitive lesion proposal generator and a very selective lesion proposal classifier. While our framework is generic, we optimize our performance by proposing a 3D contextual lesion proposal generator and by using a multi-view multi-scale lesion proposal classifier. These produce harvested and hard-negative proposals, which we then re-use to finetune our proposal generator by using a novel hard negative suppression loss, continuing this process until no extra lesions are found. Extensive experimental analysis demonstrates that our method can harvest an additional 9,805 lesions while keeping precision above 90%. To demonstrate the benefits of our approach, we show that lesion detectors trained on our harvested lesions can significantly outperform the same variants only trained on the original annotations, with boost of average precision of 7% to 10%. We open source our annotations at

* This work has been submitted to the IEEE for possible publication 

  Access Model/Code and Paper
Learning Natural Language Inference with LSTM

Nov 10, 2016
Shuohang Wang, Jing Jiang

Natural language inference (NLI) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and evaluate learning-centered methods such as deep neural networks for natural language inference (NLI). In this paper, we propose a special long short-term memory (LSTM) architecture for NLI. Our model builds on top of a recently proposed neural attention model for NLI but is based on a significantly different idea. Instead of deriving sentence embeddings for the premise and the hypothesis to be used for classification, our solution uses a match-LSTM to perform word-by-word matching of the hypothesis with the premise. This LSTM is able to place more emphasis on important word-level matching results. In particular, we observe that this LSTM remembers important mismatches that are critical for predicting the contradiction or the neutral relationship label. On the SNLI corpus, our model achieves an accuracy of 86.1%, outperforming the state of the art.

* 10 pages, 2 figures 

  Access Model/Code and Paper