Research papers and code for "Lei Zhang":
The world we see is ever-changing and it always changes with people, things, and the environment. Domain is referred to as the state of the world at a certain moment. A research problem is characterized as domain transfer adaptation when it needs knowledge correspondence between different moments. Conventional machine learning aims to find a model with the minimum expected risk on test data by minimizing the regularized empirical risk on the training data, which, however, supposes that the training and test data share similar joint probability distribution. Transfer adaptation learning aims to build models that can perform tasks of target domain by learning knowledge from a semantic related but distribution different source domain. It is an energetic research filed of increasing influence and importance. This paper surveys the recent advances in transfer adaptation learning methodology and potential benchmarks. Broader challenges being faced by transfer adaptation learning researchers are identified, i.e., instance re-weighting adaptation, feature adaptation, classifier adaptation, deep network adaptation, and adversarial adaptation, which are beyond the early semi-supervised and unsupervised split. The survey provides researchers a framework for better understanding and identifying the research status, challenges and future directions of the field.

* 21 pages, 3 figures
Click to Read Paper and Get Code
Deep learning with a convolutional neural network (CNN) has been proved to be very effective in feature extraction and representation of images. For image classification problems, this work aim at finding which classifier is more competitive based on high-level deep features of images. In this report, we have discussed the nearest neighbor, support vector machines and extreme learning machines for image classification under deep convolutional activation feature representation. Specifically, we adopt the benchmark object recognition dataset from multiple sources with domain bias for evaluating different classifiers. The deep features of the object dataset are obtained by a well-trained CNN with five convolutional layers and three fully-connected layers on the challenging ImageNet. Experiments demonstrate that the ELMs outperform SVMs in cross-domain recognition tasks. In particular, state-of-the-art results are obtained by kernel ELM which outperforms SVMs with about 4% of the average accuracy. The features and codes are available in http://www.escience.cn/people/lei/index.html

* 7 pages, 4 figures
Click to Read Paper and Get Code
Image/video data is usually represented with multiple visual features. Fusion of multi-source information for establishing the attributes has been widely recognized. Multi-feature visual recognition has recently received much attention in multimedia applications. This paper studies visual understanding via a newly proposed l_2-norm based multi-feature shared learning framework, which can simultaneously learn a global label matrix and multiple sub-classifiers with the labeled multi-feature data. Additionally, a group graph manifold regularizer composed of the Laplacian and Hessian graph is proposed for better preserving the manifold structure of each feature, such that the label prediction power is much improved through the semi-supervised learning with global label consistency. For convenience, we call the proposed approach Global-Label-Consistent Classifier (GLCC). The merits of the proposed method include: 1) the manifold structure information of each feature is exploited in learning, resulting in a more faithful classification owing to the global label consistency; 2) a group graph manifold regularizer based on the Laplacian and Hessian regularization is constructed; 3) an efficient alternative optimization method is introduced as a fast solver owing to the convex sub-problems. Experiments on several benchmark visual datasets for multimedia understanding, such as the 17-category Oxford Flower dataset, the challenging 101-category Caltech dataset, the YouTube & Consumer Videos dataset and the large-scale NUS-WIDE dataset, demonstrate that the proposed approach compares favorably with the state-of-the-art algorithms. An extensive experiment on the deep convolutional activation features also show the effectiveness of the proposed approach. The code is available on http://www.escience.cn/people/lei/index.html

* 13 pages,6 figures, this paper is accepted for publication in IEEE Transactions on Multimedia
Click to Read Paper and Get Code
Conventional object detection methods essentially suppose that the training and testing data are collected from a restricted target domain with expensive labeling cost. For alleviating the problem of domain dependency and cumbersome labeling, this paper proposes to detect objects in an unrestricted environment by leveraging domain knowledge trained from an auxiliary source domain with sufficient labels. Specifically, we propose a multi-adversarial Faster-RCNN (MAF) framework for unrestricted object detection, which inherently addresses domain disparity minimization for domain adaptation in feature representation. The paper merits are in three-fold: 1) With the idea that object detectors often becomes domain incompatible when image distribution resulted domain disparity appears, we propose a hierarchical domain feature alignment module, in which multiple adversarial domain classifier submodules for layer-wise domain feature confusion are designed; 2) An information invariant scale reduction module (SRM) for hierarchical feature map resizing is proposed for promoting the training efficiency of adversarial domain adaptation; 3) In order to improve the domain adaptability, the aggregated proposal features with detection results are feed into a proposed weighted gradient reversal layer (WGRL) for characterizing hard confused domain samples. We evaluate our MAF on unrestricted tasks, including Cityscapes, KITTI, Sim10k, etc. and the experiments show the state-of-the-art performance over the existing detectors.

* This paper is accepted in ICCV2019
Click to Read Paper and Get Code
Multilayer bootstrap network builds a gradually narrowed multilayer nonlinear network from bottom up for unsupervised nonlinear dimensionality reduction. Each layer of the network is a nonparametric density estimator. It consists of a group of k-centroids clusterings. Each clustering randomly selects data points with randomly selected features as its centroids, and learns a one-hot encoder by one-nearest-neighbor optimization. Geometrically, the nonparametric density estimator at each layer projects the input data space to a uniformly-distributed discrete feature space, where the similarity of two data points in the discrete feature space is measured by the number of the nearest centroids they share in common. The multilayer network gradually reduces the nonlinear variations of data from bottom up by building a vast number of hierarchical trees implicitly on the original data space. Theoretically, the estimation error caused by the nonparametric density estimator is proportional to the correlation between the clusterings, both of which are reduced by the randomization steps.

* accepted for publication by Neural Networks
Click to Read Paper and Get Code
In this abstract paper, we introduce a new kernel learning method by a nonparametric density estimator. The estimator consists of a group of k-centroids clusterings. Each clustering randomly selects data points with randomly selected features as its centroids, and learns a one-hot encoder by one-nearest-neighbor optimization. The estimator generates a sparse representation for each data point. Then, we construct a nonlinear kernel matrix from the sparse representation of data. One major advantage of the proposed kernel method is that it is relatively insensitive to its free parameters, and therefore, it can produce reasonable results without parameter tuning. Another advantage is that it is simple. We conjecture that the proposed method can find its applications in many learning tasks or methods where sparse representation or kernel matrix is explored. In this preliminary study, we have applied the kernel matrix to spectral clustering. Our experimental results demonstrate that the kernel generated by the proposed method outperforms the well-tuned Gaussian RBF kernel. This abstract paper is used to protect the idea, full versions will be updated later.

Click to Read Paper and Get Code
The 3rd International conference on Cloud and Robotics (ICCR2016) was held November 23-23, 2016 in Saint Quentin (France). ICCR is the premier gathering of practitioners and researchers interested in how to bring the power of Cloud computing and Robotics to a new cutting-edge domain: Cloud robotics. The objective of ICCR is a working conference, where together Cloud computing and robotics researchers/practitioners to exchange their ideas. Different with traditional academic conference, ICCR is also a forum for researchers and practitioners in the two disciplines, fostering the collaboration of Cloud computing with robotics, and for practitioners to show their working projects and to discuss and exchange their problems with researchers to find a solution.

Click to Read Paper and Get Code
Conventional extreme learning machines solve a Moore-Penrose generalized inverse of hidden layer activated matrix and analytically determine the output weights to achieve generalized performance, by assuming the same loss from different types of misclassification. The assumption may not hold in cost-sensitive recognition tasks, such as face recognition based access control system, where misclassifying a stranger as a family member may result in more serious disaster than misclassifying a family member as a stranger. Though recent cost-sensitive learning can reduce the total loss with a given cost matrix that quantifies how severe one type of mistake against another, in many realistic cases the cost matrix is unknown to users. Motivated by these concerns, this paper proposes an evolutionary cost-sensitive extreme learning machine (ECSELM), with the following merits: 1) to our best knowledge, it is the first proposal of ELM in evolutionary cost-sensitive classification scenario; 2) it well addresses the open issue of how to define the cost matrix in cost-sensitive learning tasks; 3) an evolutionary backtracking search algorithm is induced for adaptive cost matrix optimization. Experiments in a variety of cost-sensitive tasks well demonstrate the effectiveness of the proposed approaches, with about 5%~10% improvements.

* This paper has been accepted for publication in IEEE Transactions on Neural Networks and Learning Systems
Click to Read Paper and Get Code
We address the problem of visual knowledge adaptation by leveraging labeled patterns from source domain and a very limited number of labeled instances in target domain to learn a robust classifier for visual categorization. This paper proposes a new extreme learning machine based cross-domain network learning framework, that is called Extreme Learning Machine (ELM) based Domain Adaptation (EDA). It allows us to learn a category transformation and an ELM classifier with random projection by minimizing the l_(2,1)-norm of the network output weights and the learning error simultaneously. The unlabeled target data, as useful knowledge, is also integrated as a fidelity term to guarantee the stability during cross domain learning. It minimizes the matching error between the learned classifier and a base classifier, such that many existing classifiers can be readily incorporated as base classifiers. The network output weights cannot only be analytically determined, but also transferrable. Additionally, a manifold regularization with Laplacian graph is incorporated, such that it is beneficial to semi-supervised learning. Extensively, we also propose a model of multiple views, referred as MvEDA. Experiments on benchmark visual datasets for video event recognition and object recognition, demonstrate that our EDA methods outperform existing cross-domain learning methods.

* This paper has been accepted for publication in IEEE Transactions on Image Processing
Click to Read Paper and Get Code
We apply multilayer bootstrap network (MBN), a recent proposed unsupervised learning method, to unsupervised speaker recognition. The proposed method first extracts supervectors from an unsupervised universal background model, then reduces the dimension of the high-dimensional supervectors by multilayer bootstrap network, and finally conducts unsupervised speaker recognition by clustering the low-dimensional data. The comparison results with 2 unsupervised and 1 supervised speaker recognition techniques demonstrate the effectiveness and robustness of the proposed method.

Click to Read Paper and Get Code
This paper addresses an important issue, known as sensor drift that behaves a nonlinear dynamic property in electronic nose (E-nose), from the viewpoint of machine learning. Traditional methods for drift compensation are laborious and costly due to the frequent acquisition and labeling process for gases samples recalibration. Extreme learning machines (ELMs) have been confirmed to be efficient and effective learning techniques for pattern recognition and regression. However, ELMs primarily focus on the supervised, semi-supervised and unsupervised learning problems in single domain (i.e. source domain). To our best knowledge, ELM with cross-domain learning capability has never been studied. This paper proposes a unified framework, referred to as Domain Adaptation Extreme Learning Machine (DAELM), which learns a robust classifier by leveraging a limited number of labeled data from target domain for drift compensation as well as gases recognition in E-nose systems, without loss of the computational efficiency and learning ability of traditional ELM. In the unified framework, two algorithms called DAELM-S and DAELM-T are proposed for the purpose of this paper, respectively. In order to percept the differences among ELM, DAELM-S and DAELM-T, two remarks are provided. Experiments on the popular sensor drift data with multiple batches collected by E-nose system clearly demonstrate that the proposed DAELM significantly outperforms existing drift compensation methods without cumbersome measures, and also bring new perspectives for ELM.

* 11 pages, 9 figures, to appear in IEEE Transactions on Instrumentation and Measurement
Click to Read Paper and Get Code
Recently, multilayer bootstrap network (MBN) has demonstrated promising performance in unsupervised dimensionality reduction. It can learn compact representations in standard data sets, i.e. MNIST and RCV1. However, as a bootstrap method, the prediction complexity of MBN is high. In this paper, we propose an unsupervised model compression framework for this general problem of unsupervised bootstrap methods. The framework compresses a large unsupervised bootstrap model into a small model by taking the bootstrap model and its application together as a black box and learning a mapping function from the input of the bootstrap model to the output of the application by a supervised learner. To specialize the framework, we propose a new technique, named compressive MBN. It takes MBN as the unsupervised bootstrap model and deep neural network (DNN) as the supervised learner. Our initial result on MNIST showed that compressive MBN not only maintains the high prediction accuracy of MBN but also is over thousands of times faster than MBN at the prediction stage. Our result suggests that the new technique integrates the effectiveness of MBN on unsupervised learning and the effectiveness and efficiency of DNN on supervised learning together for the effectiveness and efficiency of compressive MBN on unsupervised learning.

Click to Read Paper and Get Code
In (\cite{zhang2014nonlinear,zhang2014nonlinear2}), we have viewed machine learning as a coding and dimensionality reduction problem, and further proposed a simple unsupervised dimensionality reduction method, entitled deep distributed random samplings (DDRS). In this paper, we further extend it to supervised learning incrementally. The key idea here is to incorporate label information into the coding process by reformulating that each center in DDRS has multiple output units indicating which class the center belongs to. The supervised learning method seems somewhat similar with random forests (\cite{breiman2001random}), here we emphasize their differences as follows. (i) Each layer of our method considers the relationship between part of the data points in training data with all training data points, while random forests focus on building each decision tree on only part of training data points independently. (ii) Our method builds gradually-narrowed network by sampling less and less data points, while random forests builds gradually-narrowed network by merging subclasses. (iii) Our method is trained more straightforward from bottom layer to top layer, while random forests build each tree from top layer to bottom layer by splitting. (iv) Our method encodes output targets implicitly in sparse codes, while random forests encode output targets by remembering the class attributes of the activated nodes. Therefore, our method is a simpler, more straightforward, and maybe a better alternative choice, though both methods use two very basic elements---randomization and nearest neighbor optimization---as the core. This preprint is used to protect the incremental idea from (\cite{zhang2014nonlinear,zhang2014nonlinear2}). Full empirical evaluation will be announced carefully later.

* This paper has been withdrawn by the author. The idea is wrong and is no longer to be posed on site. The paper will no longer be updated
Click to Read Paper and Get Code
One important classifier ensemble for multiclass classification problems is Error-Correcting Output Codes (ECOCs). It bridges multiclass problems and binary-class classifiers by decomposing multiclass problems to a serial binary-class problems. In this paper, we present a heuristic ternary code, named Weight Optimization and Layered Clustering-based ECOC (WOLC-ECOC). It starts with an arbitrary valid ECOC and iterates the following two steps until the training risk converges. The first step, named Layered Clustering based ECOC (LC-ECOC), constructs multiple strong classifiers on the most confusing binary-class problem. The second step adds the new classifiers to ECOC by a novel Optimized Weighted (OW) decoding algorithm, where the optimization problem of the decoding is solved by the cutting plane algorithm. Technically, LC-ECOC makes the heuristic training process not blocked by some difficult binary-class problem. OW decoding guarantees the non-increase of the training risk for ensuring a small code length. Results on 14 UCI datasets and a music genre classification problem demonstrate the effectiveness of WOLC-ECOC.

Click to Read Paper and Get Code
Unsupervised deep learning is one of the most powerful representation learning techniques. Restricted Boltzman machine, sparse coding, regularized auto-encoders, and convolutional neural networks are pioneering building blocks of deep learning. In this paper, we propose a new building block -- distributed random models. The proposed method is a special full implementation of the product of experts: (i) each expert owns multiple hidden units and different experts have different numbers of hidden units; (ii) the model of each expert is a k-center clustering, whose k-centers are only uniformly sampled examples, and whose output (i.e. the hidden units) is a sparse code that only the similarity values from a few nearest neighbors are reserved. The relationship between the pioneering building blocks, several notable research branches and the proposed method is analyzed. Experimental results show that the proposed deep model can learn better representations than deep belief networks and meanwhile can train a much larger network with much less time than deep belief networks.

* This paper has been withdrawn by the author due to a lack of full empirical evaluation
Click to Read Paper and Get Code
In this paper, we propose an extremely simple deep model for the unsupervised nonlinear dimensionality reduction -- deep distributed random samplings, which performs like a stack of unsupervised bootstrap aggregating. First, its network structure is novel: each layer of the network is a group of mutually independent $k$-centers clusterings. Second, its learning method is extremely simple: the $k$ centers of each clustering are only $k$ randomly selected examples from the training data; for small-scale data sets, the $k$ centers are further randomly reconstructed by a simple cyclic-shift operation. Experimental results on nonlinear dimensionality reduction show that the proposed method can learn abstract representations on both large-scale and small-scale problems, and meanwhile is much faster than deep neural networks on large-scale problems.

Click to Read Paper and Get Code
Multitask clustering tries to improve the clustering performance of multiple tasks simultaneously by taking their relationship into account. Most existing multitask clustering algorithms fall into the type of generative clustering, and none are formulated as convex optimization problems. In this paper, we propose two convex Discriminative Multitask Clustering (DMTC) algorithms to address the problems. Specifically, we first propose a Bayesian DMTC framework. Then, we propose two convex DMTC objectives within the framework. The first one, which can be seen as a technical combination of the convex multitask feature learning and the convex Multiclass Maximum Margin Clustering (M3C), aims to learn a shared feature representation. The second one, which can be seen as a combination of the convex multitask relationship learning and M3C, aims to learn the task relationship. The two objectives are solved in a uniform procedure by the efficient cutting-plane algorithm. Experimental results on a toy problem and two benchmark datasets demonstrate the effectiveness of the proposed algorithms.

Click to Read Paper and Get Code
Graph Neural Networks (GNNs) are powerful to learn the representation of graph-structured data. Most of the GNNs use the message-passing scheme, where the embedding of a node is iteratively updated by aggregating the information of its neighbors. To achieve a better expressive capability of node influences, attention mechanism has grown to become a popular way to assign trainable weights of a node's neighbors in the aggregation. However, though the attention-based GNNs have achieved state-of-the-art results on several tasks, a clear understanding of their discriminative capacities is missing. In this work, we present a theoretical analysis of the representational properties of the GNN that adopts attention mechanism as an aggregator. In the analysis, we show all of the cases when those GNNs always fail to distinguish distinct structures. The finding shows existing attention-based aggregators fail to preserve the cardinality of the multiset of node feature vectors in the aggregation, thus limits their discriminative ability. To improve the performance of attention-based GNNs, we propose two cardinality preserved modifications that can be applied to any kind of attention mechanisms. We evaluate them in our GNN framework on benchmark datasets for graph classification. The results validate the improvements and show the competitive performance of our models.

Click to Read Paper and Get Code
Recently, learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods, sufficient training images are required and used to learn precise hashing codes. However, in some real-world applications, there are not always sufficient training images in the domain of interest. In addition, some existing supervised approaches need a amount of labeled data, which is an expensive process in term of time, label and human expertise. To handle such problems, inspired by transfer learning, we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e., source domain to help learn precise hashing codes for the domain of interest i.e., target domain. Besides, we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermore,an alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.

Click to Read Paper and Get Code