Research papers and code for "Haibo He":
This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only $k$ nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.

* 22 pages, 14 figures, submitted to Pattern Recognition Letters
Click to Read Paper and Get Code
In this paper, we present a new wrapper feature selection approach based on Jensen-Shannon (JS) divergence, termed feature selection with maximum JS-divergence (FSMJ), for text categorization. Unlike most existing feature selection approaches, the proposed FSMJ approach is based on real-valued features which provide more information for discrimination than binary-valued features used in conventional approaches. We show that the FSMJ is a greedy approach and the JS-divergence monotonically increases when more features are selected. We conduct several experiments on real-life data sets, compared with the state-of-the-art feature selection approaches for text categorization. The superior performance of the proposed FSMJ approach demonstrates its effectiveness and further indicates its wide potential applications on data mining.

* 8 pages, 6 figures, World Congress on Intelligent Control and Automation, 2016
Click to Read Paper and Get Code
Different from face verification, face identification is much more demanding. To reach comparable performance, an identifier needs to be roughly N times better than a verifier. To expect a breakthrough in face identification, we need a fresh look at the fundamental building blocks of face recognition. In this paper we focus on the selection of a suitable signal representation and better matching strategy for face identification. We demonstrate how Gabor phase could be leveraged to improve the performance of face identification by using the Block Matching method. Compared to the existing approaches, the proposed method features much lower algorithmic complexity: face images are only filtered by a single-scale Gabor filter pair and the matching is performed between any pairs of face images at hand without involving any training process. Benchmark evaluations show that the proposed approach is totally comparable to and even better than state-of-the-art algorithms, which are typically based on more features extracted from a large set of Gabor faces and/or rely on heavy training processes.

Click to Read Paper and Get Code
Artificial neural networks which are inspired from the learning mechanism of brain have achieved great successes in many problems, especially those with deep layers. In this paper, we propose a nucleus neural network (NNN) and corresponding connecting architecture learning method. In a nucleus, there are no regular layers, i.e., a neuron may connect to all the neurons in the nucleus. This type of architecture gets rid of layer limitation and may lead to more powerful learning capability. It is crucial to determine the connections between them given numerous neurons. Based on the principle that more relevant input and output neuron pair deserves higher connecting density, we propose an efficient architecture learning model for the nucleus. Moreover, we improve the learning method for connecting weights and biases given the optimized architecture. We find that this novel architecture is robust to irrelevant components in test data. So we reconstruct a new dataset based on the MNIST dataset where the types of digital backgrounds in training and test sets are different. Experiments demonstrate that the proposed learner achieves significant improvement over traditional learners on the reconstructed data set.

Click to Read Paper and Get Code
Artificial neural networks which model the neurons and connecting architectures in brain have achieved great successes in many problems, especially those with deep layers. In this paper, we propose a nucleus neural network (NNN) and corresponding architecture and parameter learning methods. In a nucleus, there are no regular layers, i.e., a neuron may connect to all the neurons in the nucleus. This architecture gets rid of layer limitation and may lead to more powerful learning capability. It is crucial to determine the connections given numerous neurons. Based on the principle that more relevant input and output neuron pair deserves higher connecting density, we propose an architecture learning model for the nucleus. Moreover, we propose an improved learning method for learning connecting weights and biases with the optimized architecture. We find that this novel architecture is robust to irrelevant components in test data. So we define a super robust learning problem and test the proposed network with one case where the types of image backgrounds in training and test sets are different. Experiments demonstrate that the proposed learner achieves significant improvement over traditional learners on the reconstructed data set.

Click to Read Paper and Get Code
Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ($MD$) and $MD-\chi^2$ methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.

* This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figures
Click to Read Paper and Get Code
Sparse representation, which uses dictionary atoms to reconstruct input vectors, has been studied intensively in recent years. A proper dictionary is a key for the success of sparse representation. In this paper, an active dictionary learning (ADL) method is introduced, in which classification error and reconstruction error are considered as the active learning criteria in selection of the atoms for dictionary construction. The learned dictionaries are caculated in sparse representation based classification (SRC). The classification accuracy and reconstruction error are used to evaluate the proposed dictionary learning method. The performance of the proposed dictionary learning method is compared with other methods, including unsupervised dictionary learning and whole-training-data dictionary. The experimental results based on the UCI data sets and face data set demonstrate the efficiency of the proposed method.

Click to Read Paper and Get Code
With the rapid development of deep learning algorithms, action recognition in video has achieved many important research results. One issue in action recognition, Zero-Shot Action Recognition (ZSAR), has recently attracted considerable attention, which classify new categories without any positive examples. Another difficulty in action recognition is that untrimmed data may seriously affect model performance. We propose a composite two-stream framework with a pre-trained model. Our proposed framework includes a classifier branch and a composite feature branch. The graph network model is adopted in each of the two branches, which effectively improves the feature extraction and reasoning ability of the framework. In the composite feature branch, a 3-channel self-attention models are constructed to weight each frame in the video and give more attention to the key frames. Each self-attention models channel outputs a set of attention weights to focus on a particular aspect of the video, and a set of attention weights corresponds to a one-dimensional vector. The 3-channel self-attention models can evaluate key frames from multiple aspects, and the output sets of attention weight vectors form an attention matrix, which effectively enhances the attention of key frames with strong correlation of action. This model can implement action recognition under zero-shot conditions, and has good recognition performance for untrimmed video data. Experimental results on relevant data sets confirm the validity of our model.

* Accepted to ACPR 2019
Click to Read Paper and Get Code
Millions of people with severe speech disorders around the world may regain their communication capabilities through techniques of silent speech recognition (SSR). Using electroencephalography (EEG) as a biomarker for speech decoding has been popular for SSR. However, the lack of SSR text corpus has impeded the development of this technique. Here, we construct a novel task-oriented text corpus, which is utilized in the field of SSR. In the process of construction, we propose a task-oriented hybrid construction method based on natural language generation algorithm. The algorithm focuses on the strategy of data-to-text generation, and has two advantages including linguistic quality and high diversity. These two advantages use template-based method and deep neural networks respectively. In an SSR experiment with the generated text corpus, analysis results show that the performance of our hybrid construction method outperforms the pure method such as template-based natural language generation or neural natural language generation models.

* Accepted for publication in the 3rd International Conference on Natural Language Processing and Information Retrieval, 2019
Click to Read Paper and Get Code
Predicting facial attributes from faces in the wild is very challenging due to pose and lighting variations in the real world. The key to this problem is to build proper feature representations to cope with these unfavourable conditions. Given the success of Convolutional Neural Network (CNN) in image classification, the high-level CNN feature, as an intuitive and reasonable choice, has been widely utilized for this problem. In this paper, however, we consider the mid-level CNN features as an alternative to the high-level ones for attribute prediction. This is based on the observation that face attributes are different: some of them are locally oriented while others are globally defined. Our investigations reveal that the mid-level deep representations outperform the prediction accuracy achieved by the (fine-tuned) high-level abstractions. We empirically demonstrate that the midlevel representations achieve state-of-the-art prediction performance on CelebA and LFWA datasets. Our investigations also show that by utilizing the mid-level representations one can employ a single deep network to achieve both face recognition and attribute prediction.

* In proceedings of 2016 International Conference on Image Processing (ICIP)
Click to Read Paper and Get Code
Predicting attributes from face images in the wild is a challenging computer vision problem. To automatically describe face attributes from face containing images, traditionally one needs to cascade three technical blocks --- face localization, facial descriptor construction, and attribute classification --- in a pipeline. As a typical classification problem, face attribute prediction has been addressed using deep learning. Current state-of-the-art performance was achieved by using two cascaded Convolutional Neural Networks (CNNs), which were specifically trained to learn face localization and attribute description. In this paper, we experiment with an alternative way of employing the power of deep representations from CNNs. Combining with conventional face localization techniques, we use off-the-shelf architectures trained for face recognition to build facial descriptors. Recognizing that the describable face attributes are diverse, our face descriptors are constructed from different levels of the CNNs for different attributes to best facilitate face attribute prediction. Experiments on two large datasets, LFWA and CelebA, show that our approach is entirely comparable to the state-of-the-art. Our findings not only demonstrate an efficient face attribute prediction approach, but also raise an important question: how to leverage the power of off-the-shelf CNN representations for novel tasks.

* In proceeding of 2016 International Conference on Biometrics (ICB)
Click to Read Paper and Get Code
This paper presents a novel kernel-based generative classifier which is defined in a distortion subspace using polynomial series expansion, named Kernel-Distortion (KD) classifier. An iterative kernel selection algorithm is developed to steadily improve classification performance by repeatedly removing and adding kernels. The experimental results on character recognition application not only show that the proposed generative classifier performs better than many existing classifiers, but also illustrate that it has different recognition capability compared to the state-of-the-art discriminative classifier - deep belief network. The recognition diversity indicates that a hybrid combination of the proposed generative classifier and the discriminative classifier could further improve the classification performance. Two hybrid combination methods, cascading and stacking, have been implemented to verify the diversity and the improvement of the proposed classifier.

* 29 pages, 7 figures
Click to Read Paper and Get Code
This is the preprint version of our paper on IEEE Virtual Reality Conference 2015. A touch-less interaction technology on vision based wearable device is designed and evaluated. Users interact with the application with dynamic hands/feet gestures in front of the camera. Several proof-of-concept prototypes with eleven dynamic gestures are developed based on the touch-less interaction. At last, a comparing user study evaluation is proposed to demonstrate the usability of the touch-less approach, as well as the impact on user's emotion, running on a wearable framework or Google Glass.

* This is the preprint version of our paper on IEEE Virtual Reality Conference 2015
Click to Read Paper and Get Code
In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.

Click to Read Paper and Get Code
We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. We test our method on two public datasets H36M and Total Capture. The Mean Per Joint Position Errors on the two datasets are 26mm and 29mm, which outperforms the state-of-the-arts remarkably (26mm vs 52mm, 29mm vs 35mm). Our code is released at \url{https://github.com/microsoft/multiview-human-pose-estimation-pytorch}.

* Accepted by ICCV 2019
Click to Read Paper and Get Code
In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that class-specific features can be used in the proposed classification method, instead of a common feature subset for all classes as used in conventional approaches. We apply the proposed EEF classifier for text categorization as a case study and derive an optimal Bayesian classification rule with class-specific feature selection based on the Information Gain (IG) score. The promising performance on real-life data sets demonstrates the effectiveness of the proposed approach and indicates its wide potential applications.

* 9 pages, 3 figures, to be published in IEEE Signal Processing Letter. IEEE Signal Processing Letter, 2016
Click to Read Paper and Get Code
The introduction of pre-trained language models has revolutionized natural language research communities. However, researchers still know relatively little regarding their theoretical and empirical properties. In this regard, Peters et al. perform several experiments which demonstrate that it is better to adapt BERT with a light-weight task-specific head, rather than building a complex one on top of the pre-trained language model, and freeze the parameters in the said language model. However, there is another option to adopt. In this paper, we propose a new adaptation method which we first train the task model with the BERT parameters frozen and then fine-tune the entire model together. Our experimental results show that our model adaptation method can achieve 4.7% accuracy improvement in semantic similarity task, 0.99% accuracy improvement in sequence labeling task and 0.72% accuracy improvement in the text classification task.

Click to Read Paper and Get Code
The recent proposal of learned index structures opens up a new perspective on how traditional range indexes can be optimized. However, the current learned indexes assume the data distribution is relatively static and the access pattern is uniform, while real-world scenarios consist of skew query distribution and evolving data. In this paper, we demonstrate that the missing consideration of access patterns and dynamic data distribution notably hinders the applicability of learned indexes. To this end, we propose solutions for learned indexes for dynamic workloads (called Doraemon). To improve the latency for skew queries, Doraemon augments the training data with access frequencies. To address the slow model re-training when data distribution shifts, Doraemon caches the previously-trained models and incrementally fine-tunes them for similar access patterns and data distribution. Our preliminary result shows that, Doraemon improves the query latency by 45.1% and reduces the model re-training time to 1/20.

Click to Read Paper and Get Code
The graph embedding (GE) methods have been widely applied for dimensionality reduction of hyperspectral imagery (HSI). However, a major challenge of GE is how to choose proper neighbors for graph construction and explore the spatial information of HSI data. In this paper, we proposed an unsupervised dimensionality reduction algorithm termed spatial-spectral manifold reconstruction preserving embedding (SSMRPE) for HSI classification. At first, a weighted mean filter (WMF) is employed to preprocess the image, which aims to reduce the influence of background noise. According to the spatial consistency property of HSI, the SSMRPE method utilizes a new spatial-spectral combined distance (SSCD) to fuse the spatial structure and spectral information for selecting effective spatial-spectral neighbors of HSI pixels. Then, it explores the spatial relationship between each point and its neighbors to adjusts the reconstruction weights for improving the efficiency of manifold reconstruction. As a result, the proposed method can extract the discriminant features and subsequently improve the classification performance of HSI. The experimental results on PaviaU and Salinas hyperspectral datasets indicate that SSMRPE can achieve better classification accuracies in comparison with some state-of-the-art methods.

* under review in IEEE Transactions On Cybernetics
Click to Read Paper and Get Code