Models, code, and papers for "Matti Pietikainen":

Two decades of local binary patterns: A survey

Jan 16, 2017
Matti Pietikäinen, Guoying Zhao

Texture is an important characteristic for many types of images. In recent years very discriminative and computationally efficient local texture descriptors based on local binary patterns (LBP) have been developed, which has led to significant progress in applying texture methods to different problems and applications. Due to this progress, the division between texture descriptors and more generic image or video descriptors has been disappearing. A large number of different variants of LBP have been developed to improve its robustness, and to increase its discriminative power and applicability to different types of problems. In this chapter, the most recent and important variants of LBP in 2D, spatiotemporal, 3D, and 4D domains are surveyed. Interesting new developments of LBP in 1D signal analysis are also considered. Finally, some future challenges for research are presented.

* In Advances in Independent Component Analysis and Learning Machines, Academic Press, 2015, Pages 175-210 

  Click for Model/Code and Paper
Probing the Intra-Component Correlations within Fisher Vector for Material Classification

Apr 15, 2016
Xiaopeng Hong, Xianbiao Qi, Guoying Zhao, Matti Pietikäinen

Fisher vector (FV) has become a popular image representation. One notable underlying assumption of the FV framework is that local descriptors are well decorrelated within each cluster so that the covariance matrix for each Gaussian can be simplified to be diagonal. Though the FV usually relies on the Principal Component Analysis (PCA) to decorrelate local features, the PCA is applied to the entire training data and hence it only diagonalizes the \textit{universal} covariance matrix, rather than those w.r.t. the local components. As a result, the local decorrelation assumption is usually not supported in practice. To relax this assumption, this paper proposes a completed model of the Fisher vector, which is termed as the Completed Fisher vector (CFV). The CFV is a more general framework of the FV, since it encodes not only the variances but also the correlations of the whitened local descriptors. The CFV thus leads to improved discriminative power. We take the task of material categorization as an example and experimentally show that: 1) the CFV outperforms the FV under all parameter settings; 2) the CFV is robust to the changes in the number of components in the mixture; 3) even with a relatively small visual vocabulary the CFV still works well on two challenging datasets.

* It is manuscript submitted to Neurocomputing on the end of April, 2015 (!). One year past but no review comments we received yet! 

  Click for Model/Code and Paper
HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

Sep 08, 2015
Xianbiao Qi, Guoying Zhao, Jie Chen, Matti Pietikäinen

\textit{Indirect Immunofluorescence Imaging of Human Epithelial Type 2} (HEp-2) cells is an effective way to identify the presence of Anti-Nuclear Antibody (ANA). Most existing works on HEp-2 cell classification mainly focus on feature extraction, feature encoding and classifier design. Very few efforts have been devoted to study the importance of the pre-processing techniques. In this paper, we analyze the importance of the pre-processing, and investigate the role of Gaussian Scale Space (GSS) theory as a pre-processing approach for the HEp-2 cell classification task. We validate the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition Techniques for Indirect Immunofluorescence (ET-PRT-IIF) image analysis. Our system, using only one feature, outperformed the winner of the ICPR 2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not restricted to this work; it can be generalized to many existing works.

* 9 pages, 6 figures 

  Click for Model/Code and Paper
A Global Alignment Kernel based Approach for Group-level Happiness Intensity Estimation

Sep 03, 2018
Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, Guoying Zhao

With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment kernels (GAKs) to estimate the happiness intensity of a group of people. We first exploit Riesz-based volume local binary pattern (RVLBP) and deep convolutional neural network (CNN) based features for characterizing facial images. Furthermore, we propose to use the GAK for RVLBP and deep CNN features, respectively for explicitly measuring the similarity of two group-level images. Specifically, we exploit the global weight sort scheme to sort the face images from group-level image according to their spatial weights, making an efficient data structure to GAK. Lastly, we propose Multiple kernel learning based on three combination strategies for combining two respective GAKs based on RVLBP and deep CNN features, such that enhancing the discriminative ability of each GAK. Intensive experiments are performed on the challenging group-level happiness intensity database, namely HAPPEI. Our experimental results demonstrate that the proposed approach achieves promising performance for group happiness intensity analysis, when compared with the recent state-of-the-art methods.

  Click for Model/Code and Paper
LOAD: Local Orientation Adaptive Descriptor for Texture and Material Classification

Apr 22, 2015
Xianbiao Qi, Guoying Zhao, Linlin Shen, Qingquan Li, Matti Pietikainen

In this paper, we propose a novel local feature, called Local Orientation Adaptive Descriptor (LOAD), to capture regional texture in an image. In LOAD, we proposed to define point description on an Adaptive Coordinate System (ACS), adopt a binary sequence descriptor to capture relationships between one point and its neighbors and use multi-scale strategy to enhance the discriminative power of the descriptor. The proposed LOAD enjoys not only discriminative power to capture the texture information, but also has strong robustness to illumination variation and image rotation. Extensive experiments on benchmark data sets of texture classification and real-world material recognition show that the proposed LOAD yields the state-of-the-art performance. It is worth to mention that we achieve a 65.4\% classification accuracy-- which is, to the best of our knowledge, the highest record by far --on Flickr Material Database by using a single feature. Moreover, by combining LOAD with the feature extracted by Convolutional Neural Networks (CNN), we obtain significantly better performance than both the LOAD and CNN. This result confirms that the LOAD is complementary to the learning-based features.

* 13 pages, 7 figures 

  Click for Model/Code and Paper
Extended Local Binary Patterns for Efficient and Robust Spontaneous Facial Micro-Expression Recognition

Jul 22, 2019
Chengyu Guo, Jingyun Liang, Geng Zhan, Zhong Liu, Matti Pietikäinen, Li Liu

Facial MicroExpressions (MEs) are spontaneous, involuntary facial movements when a person experiences an emotion but deliberately or unconsciously attempts to conceal his or her genuine emotions. Recently, ME recognition has attracted increasing attention due to its potential applications such as clinical diagnosis, business negotiation, interrogations and security. However, it is expensive to build large scale ME datasets, mainly due to the difficulty of naturally inducing spontaneous MEs. This limits the application of deep learning techniques which require lots of training data. In this paper, we propose a simple, efficient yet robust descriptor called Extended Local Binary Patterns on Three Orthogonal Planes (ELBPTOP) for ME recognition. ELBPTOP consists of three complementary binary descriptors: LBPTOP and two novel ones Radial Difference LBPTOP (RDLBPTOP) and Angular Difference LBPTOP (ADLBPTOP), which explore the local second order information along radial and angular directions contained in ME video sequences. ELBPTOP is a novel ME descriptor inspired by the unique and subtle facial movements. It is computationally efficient and only marginally increases the cost of computing LBPTOP, yet is extremely effective for ME recognition. In addition, by firstly introducing Whitened Principal Component Analysis (WPCA) to ME recognition, we can further obtain more compact and discriminative feature representations, and achieve significantly computational savings. Extensive experimental evaluation on three popular spontaneous ME datasets SMIC, CASMEII and SAMM show that our proposed ELBPTOP approach significantly outperforms previous state of the art on all three evaluated datasets. Our proposed ELBPTOP achieves 73.94% on CASMEII, which is 6.6% higher than state of the art on this dataset. More impressively, ELBPTOP increases recognition accuracy from 44.7% to 63.44% on the SAMM dataset.

  Click for Model/Code and Paper
Texture Classification in Extreme Scale Variations using GANet

Feb 13, 2018
Li Liu, Jie Chen, Guoying Zhao, Paul Fieguth, Xilin Chen, Matti Pietikäinen

Research in texture recognition often concentrates on recognizing textures with intraclass variations such as illumination, rotation, viewpoint and small scale changes. In contrast, in real-world applications a change in scale can have a dramatic impact on texture appearance, to the point of changing completely from one texture category to another. As a result, texture variations due to changes in scale are amongst the hardest to handle. In this work we conduct the first study of classifying textures with extreme variations in scale. To address this issue, we first propose and then reduce scale proposals on the basis of dominant texture patterns. Motivated by the challenges posed by this problem, we propose a new GANet network where we use a Genetic Algorithm to change the units in the hidden layers during network training, in order to promote the learning of more informative semantic texture patterns. Finally, we adopt a FVCNN (Fisher Vector pooling of a Convolutional Neural Network filter bank) feature encoder for global texture representation. Because extreme scale variations are not necessarily present in most standard texture databases, to support the proposed extreme-scale aspects of texture understanding we are developing a new dataset, the Extreme Scale Variation Textures (ESVaT), to test the performance of our framework. It is demonstrated that the proposed framework significantly outperforms gold-standard texture features by more than 10% on ESVaT. We also test the performance of our proposed approach on the KTHTIPS2b and OS datasets and a further dataset synthetically derived from Forrest, showing superior performance compared to the state of the art.

* submitted to IEEE Transactions on Image Processing 

  Click for Model/Code and Paper
HEp-2 Cell Classification via Fusing Texture and Shape Information

Feb 16, 2015
Xianbiao Qi, Guoying Zhao, Chun-Guang Li, Jun Guo, Matti Pietikäinen

Indirect Immunofluorescence (IIF) HEp-2 cell image is an effective evidence for diagnosis of autoimmune diseases. Recently computer-aided diagnosis of autoimmune diseases by IIF HEp-2 cell classification has attracted great attention. However the HEp-2 cell classification task is quite challenging due to large intra-class variation and small between-class variation. In this paper we propose an effective and efficient approach for the automatic classification of IIF HEp-2 cell image by fusing multi-resolution texture information and richer shape information. To be specific, we propose to: a) capture the multi-resolution texture information by a novel Pairwise Rotation Invariant Co-occurrence of Local Gabor Binary Pattern (PRICoLGBP) descriptor, b) depict the richer shape information by using an Improved Fisher Vector (IFV) model with RootSIFT features which are sampled from large image patches in multiple scales, and c) combine them properly. We evaluate systematically the proposed approach on the IEEE International Conference on Pattern Recognition (ICPR) 2012, IEEE International Conference on Image Processing (ICIP) 2013 and ICPR 2014 contest data sets. The experimental results for the proposed methods significantly outperform the winners of ICPR 2012 and ICIP 2013 contest, and achieve comparable performance with the winner of the newly released ICPR 2014 contest.

* 11 pages, 7 figures 

  Click for Model/Code and Paper
Dynamic texture and scene classification by transferring deep image features

Feb 01, 2015
Xianbiao Qi, Chun-Guang Li, Guoying Zhao, Xiaopeng Hong, Matti Pietikäinen

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changing, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract feature for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a mid-level feature extractor to extract features from each frame, and then form a representation of a video by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover we explore two different implementations of the TCoF scheme, i.e., the \textit{spatial} TCoF and the \textit{temporal} TCoF, in which the mean-removed frames and the difference between two adjacent frames are used as the inputs of the ConvNet, respectively. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.

  Click for Model/Code and Paper
From BoW to CNN: Two Decades of Texture Representation for Texture Classification

Oct 03, 2018
Li Liu, Jie Chen, Paul Fieguth, Guoying Zhao, Rama Chellappa, Matti Pietikainen

Texture is a fundamental characteristic of many types of images, and texture representation is one of the essential and challenging problems in computer vision and pattern recognition which has attracted extensive research attention. Since 2000, texture representations based on Bag of Words (BoW) and on Convolutional Neural Networks (CNNs) have been extensively studied with impressive performance. Given this period of remarkable evolution, this paper aims to present a comprehensive survey of advances in texture representation over the last two decades. More than 200 major publications are cited in this survey covering different aspects of the research, which includes (i) problem description; (ii) recent advances in the broad categories of BoW-based, CNN-based and attribute-based methods; and (iii) evaluation issues, specifically benchmark datasets and state of the art results. In retrospect of what has been achieved so far, the survey discusses open challenges and directions for future research.

* Accepted by IJCV 

  Click for Model/Code and Paper
PCANet-II: When PCANet Meets the Second Order Pooling

Sep 30, 2017
Lei Tian, Xiaopeng Hong, Guoying Zhao, Chunxiao Fan, Yue Ming, Matti Pietikäinen

PCANet, as one noticeable shallow network, employs the histogram representation for feature pooling. However, there are three main problems about this kind of pooling method. First, the histogram-based pooling method binarizes the feature maps and leads to inevitable discriminative information loss. Second, it is difficult to effectively combine other visual cues into a compact representation, because the simple concatenation of various visual cues leads to feature representation inefficiency. Third, the dimensionality of histogram-based output grows exponentially with the number of feature maps used. In order to overcome these problems, we propose a novel shallow network model, named as PCANet-II. Compared with the histogram-based output, the second order pooling not only provides more discriminative information by preserving both the magnitude and sign of convolutional responses, but also dramatically reduces the size of output features. Thus we combine the second order statistical pooling method with the shallow network, i.e., PCANet. Moreover, it is easy to combine other discriminative and robust cues by using the second order pooling. So we introduce the binary feature difference encoding scheme into our PCANet-II to further improve robustness. Experiments demonstrate the effectiveness and robustness of our proposed PCANet-II method.

  Click for Model/Code and Paper
Spontaneous Facial Micro-Expression Recognition using Discriminative Spatiotemporal Local Binary Pattern with an Improved Integral Projection

Aug 07, 2016
Xiaohua Huang, Sujing Wang, Xin Liu, Guoying Zhao, Xiaoyi Feng, Matti Pietikainen

Recently, there are increasing interests in inferring mirco-expression from facial image sequences. Due to subtle facial movement of micro-expressions, feature extraction has become an important and critical issue for spontaneous facial micro-expression recognition. Recent works usually used spatiotemporal local binary pattern for micro-expression analysis. However, the commonly used spatiotemporal local binary pattern considers dynamic texture information to represent face images while misses the shape attribute of face images. On the other hand, their works extracted the spatiotemporal features from the global face regions, which ignore the discriminative information between two micro-expression classes. The above-mentioned problems seriously limit the application of spatiotemporal local binary pattern on micro-expression recognition. In this paper, we propose a discriminative spatiotemporal local binary pattern based on an improved integral projection to resolve the problems of spatiotemporal local binary pattern for micro-expression recognition. Firstly, we develop an improved integral projection for preserving the shape attribute of micro-expressions. Furthermore, an improved integral projection is incorporated with local binary pattern operators across spatial and temporal domains. Specifically, we extract the novel spatiotemporal features incorporating shape attributes into spatiotemporal texture features. For increasing the discrimination of micro-expressions, we propose a new feature selection based on Laplacian method to extract the discriminative information for facial micro-expression recognition. Intensive experiments are conducted on three availably published micro-expression databases. We compare our method with the state-of-the-art algorithms. Experimental results demonstrate that our proposed method achieves promising performance for micro-expression recognition.

* 13pages, 8 figures, 5 tables, submitted to IEEE Transactions on Image Processing 

  Click for Model/Code and Paper
Deep Learning for Generic Object Detection: A Survey

Sep 06, 2018
Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen

Generic object detection, aiming at locating object instances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in recent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought by deep learning techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subproblems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promising directions for future research.

* Submitted to IJCV, 30pages 

  Click for Model/Code and Paper
Towards Reading Hidden Emotions: A comparative Study of Spontaneous Micro-expression Spotting and Recognition Methods

Feb 08, 2017
Xiaobai Li, Xiaopeng Hong, Antti Moilanen, Xiaohua Huang, Tomas Pfister, Guoying Zhao, Matti Pietikäinen

Micro-expressions (MEs) are rapid, involuntary facial expressions which reveal emotions that people do not intend to show. Studying MEs is valuable as recognizing them has many important applications, particularly in forensic science and psychotherapy. However, analyzing spontaneous MEs is very challenging due to their short duration and low intensity. Automatic ME analysis includes two tasks: ME spotting and ME recognition. For ME spotting, previous studies have focused on posed rather than spontaneous videos. For ME recognition, the performance of previous studies is low. To address these challenges, we make the following contributions: (i)We propose the first method for spotting spontaneous MEs in long videos (by exploiting feature difference contrast). This method is training free and works on arbitrary unseen videos. (ii)We present an advanced ME recognition framework, which outperforms previous work by a large margin on two challenging spontaneous ME databases (SMIC and CASMEII). (iii)We propose the first automatic ME analysis system (MESR), which can spot and recognize MEs from spontaneous video data. Finally, we show our method outperforms humans in the ME recognition task by a large margin, and achieves comparable performance to humans at the very challenging task of spotting and then recognizing spontaneous MEs.

  Click for Model/Code and Paper
Analyzing the Affect of a Group of People Using Multi-modal Framework

Oct 13, 2016
Xiaohua Huang, Abhinav Dhall, Xin Liu, Guoying Zhao, Jingang Shi, Roland Goecke, Matti Pietikainen

Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works on Group-level Emotion Recognition (GER) have investigated on face-level information. Due to the challenging environments, face may not provide enough information to GER. Relatively few studies have investigated multi-modal GER. Therefore, we propose a novel multi-modal approach based on a new feature description for understanding emotional state of a group of people in an image. In this paper, we firstly exploit three kinds of rich information containing face, upperbody and scene in a group-level image. Furthermore, in order to integrate multiple person's information in a group-level image, we propose an information aggregation method to generate three features for face, upperbody and scene, respectively. We fuse face, upperbody and scene information for robustness of GER against the challenging environments. Intensive experiments are performed on two challenging group-level emotion databases to investigate the role of face, upperbody and scene as well as multi-modal framework. Experimental results demonstrate that our framework achieves very promising performance for GER.

* 11 pages. Submitted to the IEEE Transactions on Cybernetics 

  Click for Model/Code and Paper