Models, code, and papers for "Liang Zheng":

Dissecting Person Re-identification from the Viewpoint of Viewpoint

Jan 07, 2019
Xiaoxiao Sun, Liang Zheng

In person re-identification (re-ID),In person re-identification (re-ID), we usually refer the challenges of this task to variances in visual factors such as the viewpoint, pose, illumination and background. In spite of acknowledging these factors to be influential, quantitative studies on how they affect a re-ID system are still lacking.To gain insights in this scientific campaign, this paper makes an early attempt in studying a particular factor, viewpoint. We narrow the viewpoint problem down to the pedestrian rotation angle to obtain focused conclusions. In this regard, this paper makes two contributions to the community. First, we introduce a large-scale synthetic data engine, PersonX. Composed of hand-crafted 3D person models, the salient characteristic of this engine is "controllable". That is, we are able to synthesize pedestrians by setting the visual variables to arbitrary values. Second, on the 3D data engine, we quantitatively analyze the influence of pedestrian rotation angle on re-ID accuracy. Comprehensively, the person rotation angles are precisely customized from 0 to 360, allowing us to investigate its effect on the training, query, and gallery sets. Extensive experiment helps us gain deeper understanding of the fundamental problems in person re-ID. Our research also provides beneficial insights for dataset building and future practical usage, e.g., a person of a side view makes a better query.

* 9 pages, 7 figures 

  Click for Model/Code and Paper
Query Adaptive Late Fusion for Image Retrieval

Oct 31, 2018
Zhongdao Wang, Liang Zheng, Shengjin Wang

Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same descriptor for different local parts (face, body). Ideally, the to-be-fused heterogeneous features are pre-assumed to be discriminative and complementary to each other. However, the effectiveness of different features varies dramatically according to different queries. That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices. As a result, it is important to estimate the effectiveness of features in a query-adaptive manner. To this end, this article proposes a new late fusion scheme at the score level. We base our method on the observation that the sorted score curves contain patterns that describe their effectiveness. For example, an "L"-shaped curve indicates that the feature is discriminative while a gradually descending curve suggests a bad feature. As such, this paper introduces a query-adaptive late fusion pipeline. In the hand-crafted version, it can be an unsupervised approach to tasks like particular object retrieval. In the learning version, it can also be applied to supervised tasks like person recognition and pedestrian retrieval, based on a trainable neural module. Extensive experiments are conducted on two object retrieval datasets and one person recognition dataset. We show that our method is able to highlight the good features and suppress the bad ones, is resilient to distractor features, and achieves very competitive retrieval accuracy compared with the state of the art. In an additional person re-identification dataset, the application scope and limitation of the proposed method are studied.

  Click for Model/Code and Paper
Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in vitro

Aug 22, 2017
Zhedong Zheng, Liang Zheng, Yi Yang

The main contribution of this paper is a simple semi-supervised pipeline that only uses the original training set without collecting extra data. It is challenging in 1) how to obtain more training data only from the training set and 2) how to use the newly generated data. In this work, the generative adversarial network (GAN) is used to generate unlabeled samples. We propose the label smoothing regularization for outliers (LSRO). This method assigns a uniform label distribution to the unlabeled images, which regularizes the supervised model and improves the baseline. We verify the proposed method on a practical problem: person re-identification (re-ID). This task aims to retrieve a query person from other cameras. We adopt the deep convolutional generative adversarial network (DCGAN) for sample generation, and a baseline convolutional neural network (CNN) for representation learning. Experiments show that adding the GAN-generated data effectively improves the discriminative ability of learned CNN embeddings. On three large-scale datasets, Market-1501, CUHK03 and DukeMTMC-reID, we obtain +4.37%, +1.6% and +2.46% improvement in rank-1 precision over the baseline CNN, respectively. We additionally apply the proposed method to fine-grained bird recognition and achieve a +0.6% improvement over a strong baseline. The code is available at

* 9 pages, 6 figures, accepted by ICCV 2017 

  Click for Model/Code and Paper
Pedestrian Alignment Network for Large-scale Person Re-identification

Jul 03, 2017
Zhedong Zheng, Liang Zheng, Yi Yang

Person re-identification (person re-ID) is mostly viewed as an image retrieval problem. This task aims to search a query person in a large image pool. In practice, person re-ID usually adopts automatic detectors to obtain cropped pedestrian images. However, this process suffers from two types of detector errors: excessive background and part missing. Both errors deteriorate the quality of pedestrian alignment and may compromise pedestrian matching due to the position and scale variances. To address the misalignment problem, we propose that alignment can be learned from an identification procedure. We introduce the pedestrian alignment network (PAN) which allows discriminative embedding learning and pedestrian alignment without extra annotations. Our key observation is that when the convolutional neural network (CNN) learns to discriminate between different identities, the learned feature maps usually exhibit strong activations on the human body rather than the background. The proposed network thus takes advantage of this attention mechanism to adaptively locate and align pedestrians within a bounding box. Visual examples show that pedestrians are better aligned with PAN. Experiments on three large-scale re-ID datasets confirm that PAN improves the discriminative ability of the feature embeddings and yields competitive accuracy with the state-of-the-art methods.

  Click for Model/Code and Paper
A Discriminatively Learned CNN Embedding for Person Re-identification

Feb 03, 2017
Zhedong Zheng, Liang Zheng, Yi Yang

We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i.e, verification and classification models. The two models have their respective advantages and limitations due to different loss functions. In this paper, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a new siamese network that simultaneously computes identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus making full usage of the annotations. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show our architecture can also be applied in image retrieval.

  Click for Model/Code and Paper
Learning to Stop in Structured Prediction for Neural Machine Translation

Apr 01, 2019
Mingbo Ma, Renjie Zheng, Liang Huang

Beam search optimization resolves many issues in neural machine translation. However, this method lacks principled stopping criteria and does not learn how to stop during training, and the model naturally prefers the longer hypotheses during the testing time in practice since they use the raw score instead of the probability-based score. We propose a novel ranking method which enables an optimal beam search stopping criteria. We further introduce a structured prediction loss function which penalizes suboptimal finished candidates produced by beam search during training. Experiments of neural machine translation on both synthetic data and real languages (German-to-English and Chinese-to-English) demonstrate our proposed methods lead to better length and BLEU score.

* NAACL 2019 
* 5 pages 

  Click for Model/Code and Paper
Domain Alignment with Triplets

Jan 22, 2019
Weijian Deng, Liang Zheng, Jianbin Jiao

Deep domain adaptation methods can reduce the distribution discrepancy by learning domain-invariant embedddings. However, these methods only focus on aligning the whole data distributions, without considering the class-level relations among source and target images. Thus, a target embeddings of a bird might be aligned to source embeddings of an airplane. This semantic misalignment can directly degrade the classifier performance on the target dataset. To alleviate this problem, we present a similarity constrained alignment (SCA) method for unsupervised domain adaptation. When aligning the distributions in the embedding space, SCA enforces a similarity-preserving constraint to maintain class-level relations among the source and target images, i.e., if a source image and a target image are of the same class label, their corresponding embeddings are supposed to be aligned nearby, and vise versa. In the absence of target labels, we assign pseudo labels for target images. Given labeled source images and pseudo-labeled target images, the similarity-preserving constraint can be implemented by minimizing the triplet loss. With the joint supervision of domain alignment loss and similarity-preserving constraint, we train a network to obtain domain-invariant embeddings with two critical characteristics, intra-class compactness and inter-class separability. Extensive experiments conducted on the two datasets well demonstrate the effectiveness of SCA.

* 10 pages;This version is not fully edited and will be updated soon 

  Click for Model/Code and Paper
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Aug 28, 2018
Renjie Zheng, Mingbo Ma, Liang Huang

Neural text generation, including neural machine translation, image captioning, and summarization, has been quite successful recently. However, during training time, typically only one reference is considered for each example, even though there are often multiple references available, e.g., 4 references in NIST MT evaluations, and 5 references in image captioning data. We first investigate several different ways of utilizing multiple human references during training. But more importantly, we then propose an algorithm to generate exponentially many pseudo-references by first compressing existing human references into lattices and then traversing them to generate new pseudo-references. These approaches lead to substantial improvements over strong baselines in both machine translation (+1.5 BLEU) and image captioning (+3.1 BLEU / +11.7 CIDEr).

* Published in EMNLP 2018 
* 10 pages 

  Click for Model/Code and Paper
Unsupervised Person Re-identification: Clustering and Fine-tuning

Jun 29, 2017
Hehe Fan, Liang Zheng, Yi Yang

The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this paper, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between 1) pedestrian clustering and 2) fine-tuning of the convolutional neural network (CNN) to improve the original model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning when the model is weak, CNN is fine-tuned on a small amount of reliable examples which locate near to cluster centroids in the feature space. As the model becomes stronger in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy.

* Add more results, parameter analysis and comparisons 

  Click for Model/Code and Paper
SIFT Meets CNN: A Decade Survey of Instance Retrieval

May 23, 2017
Liang Zheng, Yi Yang, Qi Tian

In the early days, content-based image retrieval (CBIR) was studied with global features. Since 2003, image retrieval based on local descriptors (de facto SIFT) has been extensively studied for over a decade due to the advantage of SIFT in dealing with image transformations. Recently, image representations based on the convolutional neural network (CNN) have attracted increasing interest in the community and demonstrated impressive performance. Given this time of rapid evolution, this article provides a comprehensive survey of instance retrieval over the last decade. Two broad categories, SIFT-based and CNN-based methods, are presented. For the former, according to the codebook size, we organize the literature into using large/medium-sized/small codebooks. For the latter, we discuss three lines of methods, i.e., using pre-trained or fine-tuned CNN models, and hybrid methods. The first two perform a single-pass of an image to the network, while the last category employs a patch-based feature extraction scheme. This survey presents milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods. After analyzing and comparing retrieval performance of different categories on several datasets, we discuss promising directions towards generic and specialized instance retrieval.

* Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence 

  Click for Model/Code and Paper
Locality Aware Appearance Metric for Multi-Target Multi-Camera Tracking

Nov 27, 2019
Yunzhong Hou, Liang Zheng, Zhongdao Wang, Shengjin Wang

Multi-target multi-camera tracking (MTMCT) systems track targets across cameras. Due to the continuity of target trajectories, tracking systems usually restrict their data association within a local neighborhood. In single camera tracking, local neighborhood refers to consecutive frames; in multi-camera tracking, it refers to neighboring cameras that the target may appear successively. For similarity estimation, tracking systems often adopt appearance features learned from the re-identification (re-ID) perspective. Different from tracking, re-ID usually does not have access to the trajectory cues that can limit the search space to a local neighborhood. Due to its global matching property, the re-ID perspective requires to learn global appearance features. We argue that the mismatch between the local matching procedure in tracking and the global nature of re-ID appearance features may compromise MTMCT performance. To fit the local matching procedure in MTMCT, in this work, we introduce locality aware appearance metric (LAAM). Specifically, we design an intra-camera metric for single camera tracking, and an inter-camera metric for multi-camera tracking. Both metrics are trained with data pairs sampled from their corresponding local neighborhoods, as opposed to global sampling in the re-ID perspective. We show that the locally learned metrics can be successfully applied on top of several globally learned re-ID features. With the proposed method, we report new state-of-the-art performance on the DukeMTMC dataset, and a substantial improvement on the CityFlow dataset.

  Click for Model/Code and Paper
Towards Real-Time Multi-Object Tracking

Sep 27, 2019
Zhongdao Wang, Liang Zheng, Yixuan Liu, Shengjin Wang

Modern multiple object tracking (MOT) systems usually follow the tracking-by-detection paradigm. It has 1) a detection model for target localization and 2) an appearance embedding model for data association. Having the two models separately executed might lead to efficiency problems, as the running time is simply a sum of the two steps without investigating potential structures that can be shared between them. Existing research efforts on real-time MOT usually focus on the association step, so they are essentially real-time association methods but not real-time MOT system. In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. Specifically, we incorporate the appearance embedding model into a single-shot detector, such that the model can simultaneously output detections and the corresponding embeddings. As such, the system is formulated as a multi-task learning problem: there are multiple objectives, i.e., anchor classification, bounding box regression, and embedding learning; and the individual losses are automatically weighted. To our knowledge, this work reports the first (near) real-time MOT system, with a running speed of 18.8 to 24.1 FPS depending on the input resolution. Meanwhile, its tracking accuracy is comparable to the state-of-the-art trackers embodying separate detection and embedding (SDE) learning (64.4% MOTA v.s. 66.1% MOTA on MOT-16 challenge). The code and models are available at

  Click for Model/Code and Paper
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation

Sep 12, 2019
Baigong Zheng, Renjie Zheng, Mingbo Ma, Liang Huang

Simultaneous translation is widely useful but remains challenging. Previous work falls into two main categories: (a) fixed-latency policies such as Ma et al. (2019) and (b) adaptive policies such as Gu et al. (2017). The former are simple and effective, but have to aggressively predict future content due to diverging source-target word order; the latter do not anticipate, but suffer from unstable and inefficient training. To combine the merits of both approaches, we propose a simple supervised-learning framework to learn an adaptive policy from oracle READ/WRITE sequences generated from parallel text. At each step, such an oracle sequence chooses to WRITE the next target word if the available source sentence context provides enough information to do so, otherwise READ the next source word. Experiments on German<->English show that our method, without retraining the underlying NMT model, can learn flexible policies with better BLEU scores and similar latencies compared to previous work.

* EMNLP 2019 

  Click for Model/Code and Paper
Speculative Beam Search for Simultaneous Translation

Sep 12, 2019
Renjie Zheng, Mingbo Ma, Baigong Zheng, Liang Huang

Beam search is universally used in full-sentence translation but its application to simultaneous translation remains non-trivial, where output words are committed on the fly. In particular, the recently proposed wait-k policy (Ma et al., 2019a) is a simple and effective method that (after an initial wait) commits one output word on receiving each input word, making beam search seemingly impossible. To address this challenge, we propose a speculative beam search algorithm that hallucinates several steps into the future in order to reach a more accurate decision, implicitly benefiting from a target language model. This makes beam search applicable for the first time to the generation of a single word in each step. Experiments over diverse language pairs show large improvements over previous work.

* accepted by EMNLP 2019 

  Click for Model/Code and Paper
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning

Jun 04, 2019
Baigong Zheng, Renjie Zheng, Mingbo Ma, Liang Huang

Simultaneous translation is widely useful but remains one of the most difficult tasks in NLP. Previous work either uses fixed-latency policies, or train a complicated two-staged model using reinforcement learning. We propose a much simpler single model that adds a `delay' token to the target vocabulary, and design a restricted dynamic oracle to greatly simplify training. Experiments on Chinese<->English simultaneous translation show that our work leads to flexible policies that achieve better BLEU scores and lower latencies compared to both fixed and RL-learned policies.

* ACL 2019 

  Click for Model/Code and Paper
Linkage Based Face Clustering via Graph Convolution Network

Apr 08, 2019
Zhongdao Wang, Liang Zheng, Yali Li, Shengjin Wang

In this paper, we present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs. Experiments show that our method is more robust to the complex distribution of faces than conventional methods, yielding favorably comparable results to state-of-the-art methods on standard face clustering benchmarks, and is scalable to large datasets. Furthermore, we show that the proposed method does not need the number of clusters as prior, is aware of noises and outliers, and can be extended to a multi-view version for more accurate clustering accuracy.

* To appear in CVPR 2019 

  Click for Model/Code and Paper
Open Set Adversarial Examples

Sep 07, 2018
Zhedong Zheng, Liang Zheng, Zhilan Hu, Yi Yang

Adversarial examples in recent works target at closed set recognition systems, in which the training and testing classes are identical. In real-world scenarios, however, the testing classes may have limited, if any, overlap with the training classes, a problem named open set recognition. To our knowledge, the community does not have a specific design of adversarial examples targeting at this practical setting. Arguably, the new setting compromises traditional closed set attack methods in two aspects. First, closed set attack methods are based on classification and target at classification as well, but the open set problem suggests a different task, \emph{i.e.,} retrieval. It is undesirable that the generation mechanism of closed set recognition is different from the aim of open set recognition. Second, given that the query image is usually of an unseen class, predicting its category from the training classes is not reasonable, which leads to an inferior adversarial gradient. In this work, we view open set recognition as a retrieval task and propose a new approach, Opposite-Direction Feature Attack (ODFA), to generate adversarial examples / queries. When using an attacked example as query, we aim that the true matches be ranked as low as possible. In addressing the two limitations of closed set attack methods, ODFA directly works on the features for retrieval. The idea is to push away the feature of the adversarial query in the opposite direction of the original feature. Albeit simple, ODFA leads to a larger drop in Recall@K and mAP than the close-set attack methods on two open set recognition datasets, \emph{i.e.,} Market-1501 and CUB-200-2011. We also demonstrate that the attack performance of ODFA is not evidently superior to the state-of-the-art methods under closed set recognition (Cifar-10), suggesting its specificity for open set problems.

* 10 pages, 6 figures, 6 tables 

  Click for Model/Code and Paper