Models, code, and papers for "Qi Yin":

##### Multiplication fusion of sparse and collaborative-competitive representation for image classification

Jan 20, 2020
Zi-Qi Li, Jun Sun, Xiao-Jun Wu, He-Feng Yin

Representation based classification methods have become a hot research topic during the past few years, and the two most prominent approaches are sparse representation based classification (SRC) and collaborative representation based classification (CRC). CRC reveals that it is the collaborative representation rather than the sparsity that makes SRC successful. Nevertheless, the dense representation of CRC may not be discriminative which will degrade its performance for classification tasks. To alleviate this problem to some extent, we propose a new method called sparse and collaborative-competitive representation based classification (SCCRC) for image classification. Firstly, the coefficients of the test sample are obtained by SRC and CCRC, respectively. Then the fused coefficient is derived by multiplying the coefficients of SRC and CCRC. Finally, the test sample is designated to the class that has the minimum residual. Experimental results on several benchmark databases demonstrate the efficacy of our proposed SCCRC. The source code of SCCRC is accessible at https://github.com/li-zi-qi/SCCRC.

* submitted to International Journal of Machine Learning and Cybernetics
##### Learning efficient structured dictionary for image classification

Feb 09, 2020
Zi-Qi Li, Jun Sun, Xiao-Jun Wu, He-Feng Yin

Recent years have witnessed the success of dictionary learning (DL) based approaches in the domain of pattern classification. In this paper, we present an efficient structured dictionary learning (ESDL) method which takes both the diversity and label information of training samples into account. Specifically, ESDL introduces alternative training samples into the process of dictionary learning. To increase the discriminative capability of representation coefficients for classification, an ideal regularization term is incorporated into the objective function of ESDL. Moreover, in contrast with conventional DL approaches which impose computationally expensive L1-norm constraint on the coefficient matrix, ESDL employs L2-norm regularization term. Experimental results on benchmark databases (including four face databases and one scene dataset) demonstrate that ESDL outperforms previous DL approaches. More importantly, ESDL can be applied in a wide range of pattern classification tasks. The demo code of our proposed ESDL will be available at https://github.com/li-zi-qi/ESDL.

* Submitted to Journal of Electronic Imaging
##### Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

Jan 20, 2015
Erjin Zhou, Zhimin Cao, Qi Yin

Face recognition performance improves rapidly with the recent deep learning technique developing and underlying large training dataset accumulating. In this paper, we report our observations on how big data impacts the recognition performance. According to these observations, we build our Megvii Face Recognition System, which achieves 99.50% accuracy on the LFW benchmark, outperforming the previous state-of-the-art. Furthermore, we report the performance in a real-world security certification scenario. There still exists a clear gap between machine recognition and human performance. We summarize our experiments and present three challenges lying ahead in recent face recognition. And we indicate several possible solutions towards these challenges. We hope our work will stimulate the community's discussion of the difference between research benchmark and real-world applications.

##### Quantization and Training of Low Bit-Width Convolutional Neural Networks for Object Detection

Aug 17, 2017
Penghang Yin, Shuai Zhang, Yingyong Qi, Jack Xin

We present LBW-Net, an efficient optimization based method for quantization and training of the low bit-width convolutional neural networks (CNNs). Specifically, we quantize the weights to zero or powers of two by minimizing the Euclidean distance between full-precision weights and quantized weights during backpropagation. We characterize the combinatorial nature of the low bit-width quantization problem. For 2-bit (ternary) CNNs, the quantization of $N$ weights can be done by an exact formula in $O(N\log N)$ complexity. When the bit-width is three and above, we further propose a semi-analytical thresholding scheme with a single free parameter for quantization that is computationally inexpensive. The free parameter is further determined by network retraining and object detection tests. LBW-Net has several desirable advantages over full-precision CNNs, including considerable memory savings, energy efficiency, and faster deployment. Our experiments on PASCAL VOC dataset show that compared with its 32-bit floating-point counterpart, the performance of the 6-bit LBW-Net is nearly lossless in the object detection tasks, and can even do better in some real world visual scenes, while empirically enjoying more than 4$\times$ faster deployment.

##### Sliding-Window Optimization on an Ambiguity-Clearness Graph for Multi-object Tracking

Nov 28, 2015
Qi Guo, Le Dan, Dong Yin, Xiangyang Ji

Multi-object tracking remains challenging due to frequent occurrence of occlusions and outliers. In order to handle this problem, we propose an Approximation-Shrink Scheme for sequential optimization. This scheme is realized by introducing an Ambiguity-Clearness Graph to avoid conflicts and maintain sequence independent, as well as a sliding window optimization framework to constrain the size of state space and guarantee convergence. Based on this window-wise framework, the states of targets are clustered in a self-organizing manner. Moreover, we show that the traditional online and batch tracking methods can be embraced by the window-wise framework. Experiments indicate that with only a small window, the optimization performance can be much better than online methods and approach to batch methods.

##### ICDAR 2015 Text Reading in the Wild Competition

Jun 10, 2015
Xinyu Zhou, Shuchang Zhou, Cong Yao, Zhimin Cao, Qi Yin

Recently, text detection and recognition in natural scenes are becoming increasing popular in the computer vision community as well as the document analysis community. However, majority of the existing ideas, algorithms and systems are specifically designed for English. This technical report presents the final results of the ICDAR 2015 Text Reading in the Wild (TRW 2015) competition, which aims at establishing a benchmark for assessing detection and recognition algorithms devised for both Chinese and English scripts and providing a playground for researchers from the community. In this article, we describe in detail the dataset, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods. Moreover, promising directions for future research are discussed.

* 3 pages, 2 figures
##### Collaborative representation-based robust face recognition by discriminative low-rank representation

Dec 17, 2019
Wen Zhao, Xiao-Jun Wu, He-Feng Yin, Zi-Qi Li

We consider the problem of robust face recognition in which both the training and test samples might be corrupted because of disguise and occlusion. Performance of conventional subspace learning methods and recently proposed sparse representation based classification (SRC) might be degraded when corrupted training samples are provided. In addition, sparsity based approaches are time-consuming due to the sparsity constraint. To alleviate the aforementioned problems to some extent, in this paper, we propose a discriminative low-rank representation method for collaborative representation-based (DLRR-CR) robust face recognition. DLRR-CR not only obtains a clean dictionary, it further forces the sub-dictionaries for distinct classes to be as independent as possible by introducing a structural incoherence regularization term. Simultaneously, a low-rank projection matrix can be learned to remove the possible corruptions in the testing samples. Collaborative representation based classification (CRC) method is exploited in our proposed method which has closed-form solution. Experimental results obtained on public face databases verify the effectiveness and robustness of our method.

* 28 pages, 5 figures
##### Learning Deep Face Representation

Mar 12, 2014
Haoqiang Fan, Zhimin Cao, Yuning Jiang, Qi Yin, Chinchilla Doudou

Face representation is a crucial step of face recognition systems. An optimal face representation should be discriminative, robust, compact, and very easy-to-implement. While numerous hand-crafted and learning-based representations have been proposed, considerable room for improvement is still present. In this paper, we present a very easy-to-implement deep learning framework for face representation. Our method bases on a new structure of deep network (called Pyramid CNN). The proposed Pyramid CNN adopts a greedy-filter-and-down-sample operation, which enables the training procedure to be very fast and computation-efficient. In addition, the structure of Pyramid CNN can naturally incorporate feature sharing across multi-scale face representations, increasing the discriminative ability of resulting representation. Our basic network is capable of achieving high recognition accuracy ($85.8\%$ on LFW benchmark) with only 8 dimension representation. When extended to feature-sharing Pyramid CNN, our system achieves the state-of-the-art performance ($97.3\%$) on LFW benchmark. We also introduce a new benchmark of realistic face images on social network and validate our proposed representation has a good ability of generalization.

##### Better Together: Joint Reasoning for Non-rigid 3D Reconstruction with Specularities and Shading

Aug 04, 2017
Qi Liu-Yin, Rui Yu, Lourdes Agapito, Andrew Fitzgibbon, Chris Russell

We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as a single unified cost function. Moving beyond Lambertian S f S , we propose a general approach that models both specularities and shading while simultaneously tracking and reconstructing general dynamic objects. Solving these problems jointly prevents the kinds of tracking failures which can not be recovered from by pipeline approaches. We show state-of-the-art results both qualitatively and quantitatively.

* Submitted to IJCV
##### Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets

Mar 13, 2019
Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin

##### MMFNet: A Multi-modality MRI Fusion Network for Segmentation of Nasopharyngeal Carcinoma

Jan 22, 2019
Huai Chen, Yuxiao Qi, Yong Yin, Tengxiang Li, Guanzhong Gong, Lisheng Wang

Segmentation of nasopharyngeal carcinoma (NPC) from Magnetic Resonance Images (MRI) is a crucial step in NPC radiotherapy. However, manually segmenting of NPC is a time-consuming and labor-intensive task. Additionally, single-modality MRI generally cannot provide enough information for the accurate delineation of NPC. Therefore, a multi-modality MRI fusion network (MMFNet) based on three modalities of MRI (T1, T2 and contrast-enhanced T1) is proposed to complete accurate segmentation of NPC. In the MMFNet, the backbone is designed as a multi-encoder-based network, consisting of several modality-specific encoders and one single decoder. It can be used to well learn both low-level and high-level features used implicitly for NPC segmentation in each modality of MRI. A fusion block is proposed in the MMFNet to effectively fuse low-level features from multi-modality MRI. It firstly recalibrates features captured from multi-modality MRI, which will highlight informative features and regions of interest. Then, a residual fusion block is utilized to fuse weighted features before merging them with features from decoder to keep balance between high-level and low-level features. Moreover, a training strategy named self-transfer is proposed to initialize encoders for multi-encoder-based network. It can stimulate encoders to make full mining of modality-specific MRI. The proposed method can effectively make use of information in multi-modality MRI. Its effectiveness and advantages are validated by many experiments and comparisons with the related methods.

* 37 pages, 11 figures
##### BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

Sep 05, 2018
Penghang Yin, Shuai Zhang, Jiancheng Lyu, Stanley Osher, Yingyong Qi, Jack Xin

We propose BinaryRelax, a simple two-phase algorithm, for training deep neural networks with quantized weights. The set constraint that characterizes the quantization of weights is not imposed until the late stage of training, and a sequence of \emph{pseudo} quantized weights is maintained. Specifically, we relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to be the squared Euclidean distance to the set of quantized weights. The pseudo quantized weights are obtained by linearly interpolating between the float weights and their quantizations. A continuation strategy is adopted to push the weights towards the quantized state by gradually increasing the regularization parameter. In the second phase, exact quantization scheme with a small learning rate is invoked to guarantee fully quantized weights. We test BinaryRelax on the benchmark CIFAR and ImageNet color image datasets to demonstrate the superiority of the relaxed quantization approach and the improved accuracy over the state-of-the-art training methods. Finally, we prove the convergence of BinaryRelax under an approximate orthogonality condition.

##### Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Aug 29, 2018
Penghang Yin, Shuai Zhang, Jiancheng Lyu, Stanley Osher, Yingyong Qi, Jack Xin

Quantized deep neural networks (QDNNs) are attractive due to their much lower memory storage and faster inference speed than their regular full precision counterparts. To maintain the same performance level especially at low bit-widths, QDNNs must be retrained. Their training involves piecewise constant activation functions and discrete weights, hence mathematical challenges arise. We introduce the notion of coarse derivative and propose the blended coarse gradient descent (BCGD) algorithm, for training fully quantized neural networks. Coarse gradient is generally not a gradient of any function but an artificial ascent direction. The weight update of BCGD goes by coarse gradient correction of a weighted average of the full precision weights and their quantization (the so-called blending), which yields sufficient descent in the objective value and thus accelerates the training. Our experiments demonstrate that this simple blending technique is very effective for quantization at extremely low bit-width such as binarization. In full quantization of ResNet-18 for ImageNet classification task, BCGD gives 64.36% top-1 accuracy with binary weights across all layers and 4-bit adaptive activation. If the weights in the first and last layers are kept in full precision, this number increases to 65.46%. As theoretical justification, we provide the convergence analysis of coarse gradient descent for a two-layer neural network model with Gaussian input data, and prove that the expected coarse gradient correlates positively with the underlying true gradient.

##### NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Nov 18, 2019
Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

* This paper is accepted by NeurIPS 2019
##### EKT: Exercise-aware Knowledge Tracing for Student Performance Prediction

Jun 07, 2019
Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, Guoping Hu

For offering proactive services to students in intelligent education, one of the fundamental tasks is predicting their performance (e.g., scores) on future exercises, where it is necessary to track each student's knowledge acquisition during her exercising activities. However, existing approaches can only exploit the exercising records of students, and the problem of extracting rich information existed in the exercise's materials (e.g., knowledge concepts, exercise content) to achieve both precise predictions of student performance and interpretable analysis of knowledge acquisition remains underexplored. In this paper, we present a holistic study of student performance prediction. To directly achieve the primary goal of prediction, we first propose a general Exercise-Enhanced Recurrent Neural Network (EERNN) framework by exploring both student's records and the exercise contents. In EERNN, we simply summarize each student's state into an integrated vector and trace it with a recurrent neural network, where we design a bidirectional LSTM to learn the encoding of each exercise's content. For making predictions, we propose two implementations under EERNN with different strategies, i.e., EERNNM with Markov property and EERNNA with Attention mechanism. Then, to explicitly track student's knowledge acquisition on multiple knowledge concepts, we extend EERNN to an explainable Exercise-aware Knowledge Tracing (EKT) by incorporating the knowledge concept effects, where the student's integrated state vector is extended to a knowledge state matrix. In EKT, we further develop a memory network for quantifying how much each exercise can affect the mastery of students on concepts during the exercising process. Finally, we conduct extensive experiments on large-scale real-world data. The results demonstrate the prediction effectiveness of two frameworks as well as the superior interpretability of EKT.

* Accepted by IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE)
##### QuesNet: A Unified Representation for Heterogeneous Test Questions

May 27, 2019
Yu Yin, Qi Liu, Zhenya Huang, Enhong Chen, Wei Tong, Shijin Wang, Yu Su

Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representations for question understanding. However, existing pre-training methods in NLP area are infeasible to learn test question representations due to several domain-specific characteristics in education. First, questions usually comprise of heterogeneous data including content text, images and side information. Second, there exists both basic linguistic information as well as domain logic and knowledge. To this end, in this paper, we propose a novel pre-training method, namely QuesNet, for comprehensively learning question representations. Specifically, we first design a unified framework to aggregate question information with its heterogeneous inputs into a comprehensive vector. Then we propose a two-level hierarchical pre-training algorithm to learn better understanding of test questions in an unsupervised way. Here, a novel holed language model objective is developed to extract low-level linguistic features, and a domain-oriented objective is proposed to learn high-level logic and knowledge. Moreover, we show that QuesNet has good capability of being fine-tuned in many question-based tasks. We conduct extensive experiments on large-scale real-world question data, where the experimental results clearly demonstrate the effectiveness of QuesNet for question understanding as well as its superior applicability.

##### Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4

Feb 03, 2016
Cong Yao, Jianan Wu, Xinyu Zhou, Chi Zhang, Shuchang Zhou, Zhimin Cao, Qi Yin

Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms. The ICDAR 2015 Robust Reading Competition Challenge 4 was launched to assess the performance of existing scene text detection and recognition methods on incidental texts as well as to stimulate novel ideas and solutions. This report is dedicated to briefly introduce our strategies for this challenging problem and compare them with prior arts in this field.

* 3 pages, 2 figures, 5 tables
##### Category-wise Attack: Transferable Adversarial Examples for Anchor Free Object Detection

Feb 10, 2020
Quanyu Liao, Xin Wang, Bin Kong, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: sutle perturbations can completely change the classification results. Their vulnerability has led to a surge of research in this direction. However, most works dedicated to attacking anchor-based object detection models. In this work, we aim to present an effective and efficient algorithm to generate adversarial examples to attack anchor-free object models based on two approaches. First, we conduct category-wise instead of instance-wise attacks on the object detectors. Second, we leverage the high-level semantic information to generate the adversarial examples. Surprisingly, the generated adversarial examples it not only able to effectively attack the targeted anchor-free object detector but also to be transferred to attack other object detectors, even anchor-based detectors such as Faster R-CNN.

##### Transcribing Content from Structural Images with Spotlight Mechanism

May 27, 2019
Yu Yin, Zhenya Huang, Enhong Chen, Qi Liu, Fuzheng Zhang, Xing Xie, Guoping Hu

Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.

* Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18)