Models, code, and papers for "Guoying Zhao":

Two decades of local binary patterns: A survey

Jan 16, 2017
Matti Pietikäinen, Guoying Zhao

Texture is an important characteristic for many types of images. In recent years very discriminative and computationally efficient local texture descriptors based on local binary patterns (LBP) have been developed, which has led to significant progress in applying texture methods to different problems and applications. Due to this progress, the division between texture descriptors and more generic image or video descriptors has been disappearing. A large number of different variants of LBP have been developed to improve its robustness, and to increase its discriminative power and applicability to different types of problems. In this chapter, the most recent and important variants of LBP in 2D, spatiotemporal, 3D, and 4D domains are surveyed. Interesting new developments of LBP in 1D signal analysis are also considered. Finally, some future challenges for research are presented.

* In Advances in Independent Component Analysis and Learning Machines, Academic Press, 2015, Pages 175-210 

  Click for Model/Code and Paper
Micro-expression Action Unit Detection withSpatio-temporal Adaptive Pooling

Jul 11, 2019
Yante Li, Xiaohua Huang, Guoying Zhao

Action Unit (AU) detection plays an important role for facial expression recognition. To the best of our knowledge, there is little research about AU analysis for micro-expressions. In this paper, we focus on AU detection in micro-expressions. Microexpression AU detection is challenging due to the small quantity of micro-expression databases, low intensity, short duration of facial muscle change, and class imbalance. In order to alleviate the problems, we propose a novel Spatio-Temporal Adaptive Pooling (STAP) network for AU detection in micro-expressions. Firstly, STAP is aggregated by a series of convolutional filters of different sizes. In this way, STAP can obtain multi-scale information on spatial and temporal domains. On the other hand, STAP contains less parameters, thus it has less computational cost and is suitable for micro-expression AU detection on very small databases. Furthermore, STAP module is designed to pool discriminative information for micro-expression AUs on spatial and temporal domains.Finally, Focal loss is employed to prevent the vast number of negatives from overwhelming the microexpression AU detector. In experiments, we firstly polish the AU annotations on three commonly used databases. We conduct intensive experiments on three micro-expression databases, and provide several baseline results on micro-expression AU detection. The results show that our proposed approach outperforms the basic Inflated inception-v1 (I3D) in terms of an average of F1- score. We also evaluate the performance of our proposed method on cross-database protocol. It demonstrates that our proposed approach is feasible for cross-database micro-expression AU detection. Importantly, the results on three micro-expression databases and cross-database protocol provide extensive baseline results for future research on micro-expression AU detection.

* 10 pages, 4 figures 

  Click for Model/Code and Paper
Video Action Recognition Via Neural Architecture Searching

Jul 10, 2019
Wei Peng, Xiaopeng Hong, Guoying Zhao

Deep neural networks have achieved great success for video analysis and understanding. However, designing a high-performance neural architecture requires substantial efforts and expertise. In this paper, we make the first attempt to let algorithm automatically design neural networks for video action recognition tasks. Specifically, a spatio-temporal network is developed in a differentiable space modeled by a directed acyclic graph, thus a gradient-based strategy can be performed to search an optimal architecture. Nonetheless, it is computationally expensive, since the computational burden to evaluate each architecture candidate is still heavy. To alleviate this issue, we, for the video input, introduce a temporal segment approach to reduce the computational cost without losing global video information. For the architecture, we explore in an efficient search space by introducing pseudo 3D operators. Experiments show that, our architecture outperforms popular neural architectures, under the training from scratch protocol, on the challenging UCF101 dataset, surprisingly, with only around one percentage of parameters of its manual-design counterparts.

* Accepted by IEEE ICIP2019 

  Click for Model/Code and Paper
Recovering remote Photoplethysmograph Signal from Facial videos Using Spatio-Temporal Convolutional Networks

May 07, 2019
Zitong Yu, Xiaobai Li, Guoying Zhao

Recently average heart rate (HR) can be measured relatively accurately from human face videos based on non-contact remote photoplethysmography (rPPG). However in many healthcare applications, knowing only the average HR is not enough, and measured blood volume pulse signal and its heart rate variability (HRV) features are also important. We propose the first end-to-end rPPG signal recovering system (PhysNet) using deep spatio-temporal convolutional networks to measure both HR and HRV features. PhysNet extracts the spatial and temporal hidden features simultaneously from raw face sequences while outputs the corresponding rPPG signal directly. The temporal context information helps the network learn more robust features with less fluctuation. Our approach was tested on two datasets, and achieved superior performance of HR and HRV features comparing to the state-of-the-art methods.

  Click for Model/Code and Paper
Micro-Expression Spotting: A Benchmark

Oct 08, 2017
Xiaopeng Hong, Thuong-Khanh Tran, Guoying Zhao

Micro-expressions are rapid and involuntary facial expressions, which indicate the suppressed or concealed emotions. Recently, the research on automatic micro-expression (ME) spotting obtains increasing attention. ME spotting is a crucial step prior to further ME analysis tasks. The spotting results can be used as important cues to assist many other human-oriented tasks and thus have many potential applications. In this paper, by investigating existing ME spotting methods, we recognize the immediacy of standardizing the performance evaluation of micro-expression spotting methods. To this end, we construct a micro-expression spotting benchmark (MESB). Firstly, we set up a sliding window based multi-scale evaluation framework. Secondly, we introduce a series of protocols. Thirdly, we also provide baseline results of popular methods. The MESB facilitates the research on ME spotting with fairer and more comprehensive evaluation and also enables to leverage the cutting-edge machine learning tools widely.

  Click for Model/Code and Paper
Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network

May 07, 2019
Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling that encapsulates facial deformations over time in terms of a single image. For the training phase, these CDIs are fed into an end-to-end deep learning model, and the resultant predictions collaborate over multi-views for performance gain in expression classification. Furthermore, we propose a 4D augmentation scheme that not only expands the training data scale but also introduces significant facial muscle movement patterns to improve the FER performance. Results from extensive experiments on the commonly used BU-4DFE dataset under widely adopted settings show that our proposed method outperforms the state-of-the-art 4D FER methods by achieving an accuracy of 96.5% indicating its effectiveness.

* 11 pages, 4 figures, submitted paper 

  Click for Model/Code and Paper
A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework

Jan 23, 2019
Wei Peng, Xiaopeng Hong, Yingyue Xu, Guoying Zhao

Facial Micro-expression Recognition (MER) distinguishes the underlying emotional states of spontaneous subtle facialexpressions. Automatic MER is challenging because that 1) the intensity of subtle facial muscle movement is extremely lowand 2) the duration of ME is transient.Recent works adopt motion magnification or time interpolation to resolve these issues. Nevertheless, existing works dividethem into two separate modules due to their non-linearity. Though such operation eases the difficulty in implementation, itignores their underlying connections and thus results in inevitable losses in both accuracy and speed. Instead, in this paper, weexplore their underlying joint formulations and propose a consolidated Eulerian framework to reveal the subtle facial movements.It expands the temporal duration and amplifies the muscle movements in micro-expressions simultaneously. Compared toexisting approaches, the proposed method can not only process ME clips more efficiently but also make subtle ME movementsmore distinguishable. Experiments on two public MER databases indicate that our model outperforms the state-of-the-art inboth speed and accuracy.

* conference IEEE FG2019 

  Click for Model/Code and Paper
Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video

May 03, 2016
Jing Zhou, Xiaopeng Hong, Fei Su, Guoying Zhao

Automatic pain intensity estimation possesses a significant position in healthcare and medical field. Traditional static methods prefer to extract features from frames separately in a video, which would result in unstable changes and peaks among adjacent frames. To overcome this problem, we propose a real-time regression framework based on the recurrent convolutional neural network for automatic frame-level pain intensity estimation. Given vector sequences of AAM-warped facial images, we used a sliding-window strategy to obtain fixed-length input samples for the recurrent network. We then carefully design the architecture of the recurrent network to output continuous-valued pain intensity. The proposed end-to-end pain intensity regression framework can predict the pain intensity of each frame by considering a sufficiently large historical frames while limiting the scale of the parameters within the model. Our method achieves promising results regarding both accuracy and running speed on the published UNBC-McMaster Shoulder Pain Expression Archive Database.

* This paper is the pre-print technical report of the paper accepted by the IEEE CVPR Workshop of Affect "in-the-wild". The final version will be available after the workshop 

  Click for Model/Code and Paper
Probing the Intra-Component Correlations within Fisher Vector for Material Classification

Apr 15, 2016
Xiaopeng Hong, Xianbiao Qi, Guoying Zhao, Matti Pietikäinen

Fisher vector (FV) has become a popular image representation. One notable underlying assumption of the FV framework is that local descriptors are well decorrelated within each cluster so that the covariance matrix for each Gaussian can be simplified to be diagonal. Though the FV usually relies on the Principal Component Analysis (PCA) to decorrelate local features, the PCA is applied to the entire training data and hence it only diagonalizes the \textit{universal} covariance matrix, rather than those w.r.t. the local components. As a result, the local decorrelation assumption is usually not supported in practice. To relax this assumption, this paper proposes a completed model of the Fisher vector, which is termed as the Completed Fisher vector (CFV). The CFV is a more general framework of the FV, since it encodes not only the variances but also the correlations of the whitened local descriptors. The CFV thus leads to improved discriminative power. We take the task of material categorization as an example and experimentally show that: 1) the CFV outperforms the FV under all parameter settings; 2) the CFV is robust to the changes in the number of components in the mixture; 3) even with a relatively small visual vocabulary the CFV still works well on two challenging datasets.

* It is manuscript submitted to Neurocomputing on the end of April, 2015 (!). One year past but no review comments we received yet! 

  Click for Model/Code and Paper
HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

Sep 08, 2015
Xianbiao Qi, Guoying Zhao, Jie Chen, Matti Pietikäinen

\textit{Indirect Immunofluorescence Imaging of Human Epithelial Type 2} (HEp-2) cells is an effective way to identify the presence of Anti-Nuclear Antibody (ANA). Most existing works on HEp-2 cell classification mainly focus on feature extraction, feature encoding and classifier design. Very few efforts have been devoted to study the importance of the pre-processing techniques. In this paper, we analyze the importance of the pre-processing, and investigate the role of Gaussian Scale Space (GSS) theory as a pre-processing approach for the HEp-2 cell classification task. We validate the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition Techniques for Indirect Immunofluorescence (ET-PRT-IIF) image analysis. Our system, using only one feature, outperformed the winner of the ICPR 2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not restricted to this work; it can be generalized to many existing works.

* 9 pages, 6 figures 

  Click for Model/Code and Paper
Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement

Jul 27, 2019
Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao

Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications (e.g., remote healthcare). Existing rPPG approaches rely on analyzing very fine details of facial videos, which are prone to be affected by video compression. Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos. The method includes two parts: 1) a Spatio-Temporal Video Enhancement Network (STVEN) for video enhancement, and 2) an rPPG network (rPPGNet) for rPPG signal recovery. The rPPGNet can work on its own for robust rPPG measurement, and the STVEN network can be added and jointly trained to further boost the performance especially on highly compressed videos. Comprehensive experiments are performed on two benchmark datasets to show that, 1) the proposed method not only achieves superior performance on compressed videos with high-quality videos pair, 2) it also generalizes well on novel data with only compressed videos available, which implies the promising potential for real world applications.

* IEEE ICCV2019, accepted 

  Click for Model/Code and Paper
Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-expressions

Jan 15, 2019
Zhaoqiang Xia, Xiaopeng Hong, Xingyu Gao, Xiaoyi Feng, Guoying Zhao

Recently, the recognition task of spontaneous facial micro-expressions has attracted much attention with its various real-world applications. Plenty of handcrafted or learned features have been employed for a variety of classifiers and achieved promising performances for recognizing micro-expressions. However, the micro-expression recognition is still challenging due to the subtle spatiotemporal changes of micro-expressions. To exploit the merits of deep learning, we propose a novel deep recurrent convolutional networks based micro-expression recognition approach, capturing the spatial-temporal deformations of micro-expression sequence. Specifically, the proposed deep model is constituted of several recurrent convolutional layers for extracting visual features and a classificatory layer for recognition. It is optimized by an end-to-end manner and obviates manual feature design. To handle sequential data, we exploit two types of extending the connectivity of convolutional networks across temporal domain, in which the spatiotemporal deformations are modeled in views of facial appearance and geometry separately. Besides, to overcome the shortcomings of limited and imbalanced training samples, temporal data augmentation strategies as well as a balanced loss are jointly used for our deep network. By performing the experiments on three spontaneous micro-expression datasets, we verify the effectiveness of our proposed micro-expression recognition approach compared to the state-of-the-art methods.

* Submitted to IEEE TMM 

  Click for Model/Code and Paper
A Global Alignment Kernel based Approach for Group-level Happiness Intensity Estimation

Sep 03, 2018
Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, Guoying Zhao

With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment kernels (GAKs) to estimate the happiness intensity of a group of people. We first exploit Riesz-based volume local binary pattern (RVLBP) and deep convolutional neural network (CNN) based features for characterizing facial images. Furthermore, we propose to use the GAK for RVLBP and deep CNN features, respectively for explicitly measuring the similarity of two group-level images. Specifically, we exploit the global weight sort scheme to sort the face images from group-level image according to their spatial weights, making an efficient data structure to GAK. Lastly, we propose Multiple kernel learning based on three combination strategies for combining two respective GAKs based on RVLBP and deep CNN features, such that enhancing the discriminative ability of each GAK. Intensive experiments are performed on the challenging group-level happiness intensity database, namely HAPPEI. Our experimental results demonstrate that the proposed approach achieves promising performance for group happiness intensity analysis, when compared with the recent state-of-the-art methods.

  Click for Model/Code and Paper
SRN: Side-output Residual Network for Object Reflection Symmetry Detection and Beyond

Jul 17, 2018
Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

In this paper, we establish a baseline for object reflection symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The end-to-end deep learning approach, referred to as a side-output residual network (SRN), leverages the output residual units (RUs) to fit the errors between the object ground-truth symmetry and the side-outputs of multiple stages. By cascading RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple stages to address the challenges of fitting complex output with limited convolutional layers, suppressing the complex backgrounds, and effectively matching object symmetry at different scales. SRN is further upgraded to a multi-task side-output residual network (MT-SRN) for joint symmetry and edge detection, demonstrating its generality to image-to-mask learning tasks. Experimental results validate both the challenging aspects of Sym-PASCAL benchmark related to real-world images and the state-of-the-art performance of the proposed SRN approach.

* submitted to PAMI, under major revision 

  Click for Model/Code and Paper
Learning a Target Sample Re-Generator for Cross-Database Micro-Expression Recognition

Jul 26, 2017
Yuan Zong, Xiaohua Huang, Wenming Zheng, Zhen Cui, Guoying Zhao

In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.

* To appear at ACM Multimedia 2017 

  Click for Model/Code and Paper
SRN: Side-output Residual Network for Object Symmetry Detection in the Wild

Apr 01, 2017
Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

In this paper, we establish a baseline for object symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, named Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The proposed symmetry detection approach, named Side-output Residual Network (SRN), leverages output Residual Units (RUs) to fit the errors between the object symmetry groundtruth and the outputs of RUs. By stacking RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple scales to ease the problems of fitting complex outputs with limited layers, suppressing the complex backgrounds, and effectively matching object symmetry of different scales. Experimental results validate both the benchmark and its challenging aspects related to realworld images, and the state-of-the-art performance of our symmetry detection approach. The benchmark and the code for SRN are publicly available at

* Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 

  Click for Model/Code and Paper
LOAD: Local Orientation Adaptive Descriptor for Texture and Material Classification

Apr 22, 2015
Xianbiao Qi, Guoying Zhao, Linlin Shen, Qingquan Li, Matti Pietikainen

In this paper, we propose a novel local feature, called Local Orientation Adaptive Descriptor (LOAD), to capture regional texture in an image. In LOAD, we proposed to define point description on an Adaptive Coordinate System (ACS), adopt a binary sequence descriptor to capture relationships between one point and its neighbors and use multi-scale strategy to enhance the discriminative power of the descriptor. The proposed LOAD enjoys not only discriminative power to capture the texture information, but also has strong robustness to illumination variation and image rotation. Extensive experiments on benchmark data sets of texture classification and real-world material recognition show that the proposed LOAD yields the state-of-the-art performance. It is worth to mention that we achieve a 65.4\% classification accuracy-- which is, to the best of our knowledge, the highest record by far --on Flickr Material Database by using a single feature. Moreover, by combining LOAD with the feature extracted by Convolutional Neural Networks (CNN), we obtain significantly better performance than both the LOAD and CNN. This result confirms that the LOAD is complementary to the learning-based features.

* 13 pages, 7 figures 

  Click for Model/Code and Paper
Texture Classification in Extreme Scale Variations using GANet

Feb 13, 2018
Li Liu, Jie Chen, Guoying Zhao, Paul Fieguth, Xilin Chen, Matti Pietikäinen

Research in texture recognition often concentrates on recognizing textures with intraclass variations such as illumination, rotation, viewpoint and small scale changes. In contrast, in real-world applications a change in scale can have a dramatic impact on texture appearance, to the point of changing completely from one texture category to another. As a result, texture variations due to changes in scale are amongst the hardest to handle. In this work we conduct the first study of classifying textures with extreme variations in scale. To address this issue, we first propose and then reduce scale proposals on the basis of dominant texture patterns. Motivated by the challenges posed by this problem, we propose a new GANet network where we use a Genetic Algorithm to change the units in the hidden layers during network training, in order to promote the learning of more informative semantic texture patterns. Finally, we adopt a FVCNN (Fisher Vector pooling of a Convolutional Neural Network filter bank) feature encoder for global texture representation. Because extreme scale variations are not necessarily present in most standard texture databases, to support the proposed extreme-scale aspects of texture understanding we are developing a new dataset, the Extreme Scale Variation Textures (ESVaT), to test the performance of our framework. It is demonstrated that the proposed framework significantly outperforms gold-standard texture features by more than 10% on ESVaT. We also test the performance of our proposed approach on the KTHTIPS2b and OS datasets and a further dataset synthetically derived from Forrest, showing superior performance compared to the state of the art.

* submitted to IEEE Transactions on Image Processing 

  Click for Model/Code and Paper
HEp-2 Cell Classification via Fusing Texture and Shape Information

Feb 16, 2015
Xianbiao Qi, Guoying Zhao, Chun-Guang Li, Jun Guo, Matti Pietikäinen

Indirect Immunofluorescence (IIF) HEp-2 cell image is an effective evidence for diagnosis of autoimmune diseases. Recently computer-aided diagnosis of autoimmune diseases by IIF HEp-2 cell classification has attracted great attention. However the HEp-2 cell classification task is quite challenging due to large intra-class variation and small between-class variation. In this paper we propose an effective and efficient approach for the automatic classification of IIF HEp-2 cell image by fusing multi-resolution texture information and richer shape information. To be specific, we propose to: a) capture the multi-resolution texture information by a novel Pairwise Rotation Invariant Co-occurrence of Local Gabor Binary Pattern (PRICoLGBP) descriptor, b) depict the richer shape information by using an Improved Fisher Vector (IFV) model with RootSIFT features which are sampled from large image patches in multiple scales, and c) combine them properly. We evaluate systematically the proposed approach on the IEEE International Conference on Pattern Recognition (ICPR) 2012, IEEE International Conference on Image Processing (ICIP) 2013 and ICPR 2014 contest data sets. The experimental results for the proposed methods significantly outperform the winners of ICPR 2012 and ICIP 2013 contest, and achieve comparable performance with the winner of the newly released ICPR 2014 contest.

* 11 pages, 7 figures 

  Click for Model/Code and Paper
Dynamic texture and scene classification by transferring deep image features

Feb 01, 2015
Xianbiao Qi, Chun-Guang Li, Guoying Zhao, Xiaopeng Hong, Matti Pietikäinen

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changing, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract feature for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a mid-level feature extractor to extract features from each frame, and then form a representation of a video by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover we explore two different implementations of the TCoF scheme, i.e., the \textit{spatial} TCoF and the \textit{temporal} TCoF, in which the mean-removed frames and the difference between two adjacent frames are used as the inputs of the ConvNet, respectively. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.

  Click for Model/Code and Paper