Models, code, and papers for "Guoying Zhao":

Two decades of local binary patterns: A survey

Jan 16, 2017
Matti Pietikäinen, Guoying Zhao

Texture is an important characteristic for many types of images. In recent years very discriminative and computationally efficient local texture descriptors based on local binary patterns (LBP) have been developed, which has led to significant progress in applying texture methods to different problems and applications. Due to this progress, the division between texture descriptors and more generic image or video descriptors has been disappearing. A large number of different variants of LBP have been developed to improve its robustness, and to increase its discriminative power and applicability to different types of problems. In this chapter, the most recent and important variants of LBP in 2D, spatiotemporal, 3D, and 4D domains are surveyed. Interesting new developments of LBP in 1D signal analysis are also considered. Finally, some future challenges for research are presented.

* In Advances in Independent Component Analysis and Learning Machines, Academic Press, 2015, Pages 175-210 

  Access Model/Code and Paper
Micro-expression Action Unit Detection withSpatio-temporal Adaptive Pooling

Jul 11, 2019
Yante Li, Xiaohua Huang, Guoying Zhao

Action Unit (AU) detection plays an important role for facial expression recognition. To the best of our knowledge, there is little research about AU analysis for micro-expressions. In this paper, we focus on AU detection in micro-expressions. Microexpression AU detection is challenging due to the small quantity of micro-expression databases, low intensity, short duration of facial muscle change, and class imbalance. In order to alleviate the problems, we propose a novel Spatio-Temporal Adaptive Pooling (STAP) network for AU detection in micro-expressions. Firstly, STAP is aggregated by a series of convolutional filters of different sizes. In this way, STAP can obtain multi-scale information on spatial and temporal domains. On the other hand, STAP contains less parameters, thus it has less computational cost and is suitable for micro-expression AU detection on very small databases. Furthermore, STAP module is designed to pool discriminative information for micro-expression AUs on spatial and temporal domains.Finally, Focal loss is employed to prevent the vast number of negatives from overwhelming the microexpression AU detector. In experiments, we firstly polish the AU annotations on three commonly used databases. We conduct intensive experiments on three micro-expression databases, and provide several baseline results on micro-expression AU detection. The results show that our proposed approach outperforms the basic Inflated inception-v1 (I3D) in terms of an average of F1- score. We also evaluate the performance of our proposed method on cross-database protocol. It demonstrates that our proposed approach is feasible for cross-database micro-expression AU detection. Importantly, the results on three micro-expression databases and cross-database protocol provide extensive baseline results for future research on micro-expression AU detection.

* 10 pages, 4 figures 

  Access Model/Code and Paper
Video Action Recognition Via Neural Architecture Searching

Jul 10, 2019
Wei Peng, Xiaopeng Hong, Guoying Zhao

Deep neural networks have achieved great success for video analysis and understanding. However, designing a high-performance neural architecture requires substantial efforts and expertise. In this paper, we make the first attempt to let algorithm automatically design neural networks for video action recognition tasks. Specifically, a spatio-temporal network is developed in a differentiable space modeled by a directed acyclic graph, thus a gradient-based strategy can be performed to search an optimal architecture. Nonetheless, it is computationally expensive, since the computational burden to evaluate each architecture candidate is still heavy. To alleviate this issue, we, for the video input, introduce a temporal segment approach to reduce the computational cost without losing global video information. For the architecture, we explore in an efficient search space by introducing pseudo 3D operators. Experiments show that, our architecture outperforms popular neural architectures, under the training from scratch protocol, on the challenging UCF101 dataset, surprisingly, with only around one percentage of parameters of its manual-design counterparts.

* Accepted by IEEE ICIP2019 

  Access Model/Code and Paper
Recovering remote Photoplethysmograph Signal from Facial videos Using Spatio-Temporal Convolutional Networks

May 07, 2019
Zitong Yu, Xiaobai Li, Guoying Zhao

Recently average heart rate (HR) can be measured relatively accurately from human face videos based on non-contact remote photoplethysmography (rPPG). However in many healthcare applications, knowing only the average HR is not enough, and measured blood volume pulse signal and its heart rate variability (HRV) features are also important. We propose the first end-to-end rPPG signal recovering system (PhysNet) using deep spatio-temporal convolutional networks to measure both HR and HRV features. PhysNet extracts the spatial and temporal hidden features simultaneously from raw face sequences while outputs the corresponding rPPG signal directly. The temporal context information helps the network learn more robust features with less fluctuation. Our approach was tested on two datasets, and achieved superior performance of HR and HRV features comparing to the state-of-the-art methods.

  Access Model/Code and Paper
Micro-Expression Spotting: A Benchmark

Oct 08, 2017
Xiaopeng Hong, Thuong-Khanh Tran, Guoying Zhao

Micro-expressions are rapid and involuntary facial expressions, which indicate the suppressed or concealed emotions. Recently, the research on automatic micro-expression (ME) spotting obtains increasing attention. ME spotting is a crucial step prior to further ME analysis tasks. The spotting results can be used as important cues to assist many other human-oriented tasks and thus have many potential applications. In this paper, by investigating existing ME spotting methods, we recognize the immediacy of standardizing the performance evaluation of micro-expression spotting methods. To this end, we construct a micro-expression spotting benchmark (MESB). Firstly, we set up a sliding window based multi-scale evaluation framework. Secondly, we introduce a series of protocols. Thirdly, we also provide baseline results of popular methods. The MESB facilitates the research on ME spotting with fairer and more comprehensive evaluation and also enables to leverage the cutting-edge machine learning tools widely.

  Access Model/Code and Paper
Sparsity-Aware Deep Learning for Automatic 4D Facial Expression Recognition

Feb 08, 2020
Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing channel concatenation. Encoded in the given 3D landmarks, we also introduce TOP-landmarks over multi-views, an effective way to capture the facial muscle movements from three orthogonal planes. Importantly, we then present a sparsity-aware network to compute the sparse representations of convolutional features over multi-views for a significant and computationally convenient deep learning. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the BU-4DFE dataset show the significance of our method over the state-of-the-art methods by reaching a promising accuracy of 99.69% for 4D FER.

* Submitted to IEEE ICIP 2020 

  Access Model/Code and Paper
Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching

Nov 11, 2019
Wei Peng, Xiaopeng Hong, Haoyu Chen, Guoying Zhao

Human action recognition from skeleton data, fueled by the Graph Convolutional Network (GCN), has attracted lots of attention, due to its powerful capability of modeling non-Euclidean structure data. However, many existing GCN methods provide a pre-defined graph and fix it through the entire network, which can loss implicit joint correlations. Besides, the mainstream spectral GCN is approximated by one-order hop, thus higher-order connections are not well involved. Therefore, huge efforts are required to explore a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for skeleton-based action recognition. Specifically, we enrich the search space by providing multiple dynamic graph modules after fully exploring the spatial-temporal correlations between nodes. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a sampling- and memory-efficient evolution strategy is proposed to search an optimal architecture for this task. The resulted architecture proves the effectiveness of the higher-order approximation and the dynamic graph modeling mechanism with temporal interactions, which is barely discussed before. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scaled datasets and the results show that our model gets the state-of-the-art results.

* Accepted by AAAI2020 

  Access Model/Code and Paper
Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition

Oct 11, 2019
Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

We propose a novel landmarks-assisted collaborative end-to-end deep framework for automatic 4D FER. Using 4D face scan data, we calculate its various geometrical images, and afterwards use rank pooling to generate their dynamic images encapsulating important facial muscle movements over time. As well, the given 3D landmarks are projected on a 2D plane as binary images and convolutional layers are used to extract sequences of feature vectors for every landmark video. During the training stage, the dynamic images are used to train an end-to-end deep network, while the feature vectors of landmark images are used train a long short-term memory (LSTM) network. The finally improved set of expression predictions are obtained when the dynamic and landmark images collaborate over multi-views using the proposed deep framework. Performance results obtained from extensive experimentation on the widely-adopted BU-4DFE database under globally used settings prove that our proposed collaborative framework outperforms the state-of-the-art 4D FER methods and reach a promising classification accuracy of 96.7% demonstrating its effectiveness.

* 5 pages, 2 figures, 2 tables 

  Access Model/Code and Paper
Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network

May 07, 2019
Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao

This paper proposes a novel 4D Facial Expression Recognition (FER) method using Collaborative Cross-domain Dynamic Image Network (CCDN). Given a 4D data of face scans, we first compute its geometrical images, and then combine their correlated information in the proposed cross-domain image representations. The acquired set is then used to generate cross-domain dynamic images (CDI) via rank pooling that encapsulates facial deformations over time in terms of a single image. For the training phase, these CDIs are fed into an end-to-end deep learning model, and the resultant predictions collaborate over multi-views for performance gain in expression classification. Furthermore, we propose a 4D augmentation scheme that not only expands the training data scale but also introduces significant facial muscle movement patterns to improve the FER performance. Results from extensive experiments on the commonly used BU-4DFE dataset under widely adopted settings show that our proposed method outperforms the state-of-the-art 4D FER methods by achieving an accuracy of 96.5% indicating its effectiveness.

* 11 pages, 4 figures, submitted paper 

  Access Model/Code and Paper
A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework

Jan 23, 2019
Wei Peng, Xiaopeng Hong, Yingyue Xu, Guoying Zhao

Facial Micro-expression Recognition (MER) distinguishes the underlying emotional states of spontaneous subtle facialexpressions. Automatic MER is challenging because that 1) the intensity of subtle facial muscle movement is extremely lowand 2) the duration of ME is transient.Recent works adopt motion magnification or time interpolation to resolve these issues. Nevertheless, existing works dividethem into two separate modules due to their non-linearity. Though such operation eases the difficulty in implementation, itignores their underlying connections and thus results in inevitable losses in both accuracy and speed. Instead, in this paper, weexplore their underlying joint formulations and propose a consolidated Eulerian framework to reveal the subtle facial movements.It expands the temporal duration and amplifies the muscle movements in micro-expressions simultaneously. Compared toexisting approaches, the proposed method can not only process ME clips more efficiently but also make subtle ME movementsmore distinguishable. Experiments on two public MER databases indicate that our model outperforms the state-of-the-art inboth speed and accuracy.

* conference IEEE FG2019 

  Access Model/Code and Paper
Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video

May 03, 2016
Jing Zhou, Xiaopeng Hong, Fei Su, Guoying Zhao

Automatic pain intensity estimation possesses a significant position in healthcare and medical field. Traditional static methods prefer to extract features from frames separately in a video, which would result in unstable changes and peaks among adjacent frames. To overcome this problem, we propose a real-time regression framework based on the recurrent convolutional neural network for automatic frame-level pain intensity estimation. Given vector sequences of AAM-warped facial images, we used a sliding-window strategy to obtain fixed-length input samples for the recurrent network. We then carefully design the architecture of the recurrent network to output continuous-valued pain intensity. The proposed end-to-end pain intensity regression framework can predict the pain intensity of each frame by considering a sufficiently large historical frames while limiting the scale of the parameters within the model. Our method achieves promising results regarding both accuracy and running speed on the published UNBC-McMaster Shoulder Pain Expression Archive Database.

* This paper is the pre-print technical report of the paper accepted by the IEEE CVPR Workshop of Affect "in-the-wild". The final version will be available after the workshop 

  Access Model/Code and Paper
Probing the Intra-Component Correlations within Fisher Vector for Material Classification

Apr 15, 2016
Xiaopeng Hong, Xianbiao Qi, Guoying Zhao, Matti Pietikäinen

Fisher vector (FV) has become a popular image representation. One notable underlying assumption of the FV framework is that local descriptors are well decorrelated within each cluster so that the covariance matrix for each Gaussian can be simplified to be diagonal. Though the FV usually relies on the Principal Component Analysis (PCA) to decorrelate local features, the PCA is applied to the entire training data and hence it only diagonalizes the \textit{universal} covariance matrix, rather than those w.r.t. the local components. As a result, the local decorrelation assumption is usually not supported in practice. To relax this assumption, this paper proposes a completed model of the Fisher vector, which is termed as the Completed Fisher vector (CFV). The CFV is a more general framework of the FV, since it encodes not only the variances but also the correlations of the whitened local descriptors. The CFV thus leads to improved discriminative power. We take the task of material categorization as an example and experimentally show that: 1) the CFV outperforms the FV under all parameter settings; 2) the CFV is robust to the changes in the number of components in the mixture; 3) even with a relatively small visual vocabulary the CFV still works well on two challenging datasets.

* It is manuscript submitted to Neurocomputing on the end of April, 2015 (!). One year past but no review comments we received yet! 

  Access Model/Code and Paper
HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

Sep 08, 2015
Xianbiao Qi, Guoying Zhao, Jie Chen, Matti Pietikäinen

\textit{Indirect Immunofluorescence Imaging of Human Epithelial Type 2} (HEp-2) cells is an effective way to identify the presence of Anti-Nuclear Antibody (ANA). Most existing works on HEp-2 cell classification mainly focus on feature extraction, feature encoding and classifier design. Very few efforts have been devoted to study the importance of the pre-processing techniques. In this paper, we analyze the importance of the pre-processing, and investigate the role of Gaussian Scale Space (GSS) theory as a pre-processing approach for the HEp-2 cell classification task. We validate the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition Techniques for Indirect Immunofluorescence (ET-PRT-IIF) image analysis. Our system, using only one feature, outperformed the winner of the ICPR 2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not restricted to this work; it can be generalized to many existing works.

* 9 pages, 6 figures 

  Access Model/Code and Paper
Deep-HR: Fast Heart Rate Estimation from Face Video Under Realistic Conditions

Feb 12, 2020
Mohammad Sabokrou, Masoud Pourreza, Xiaobai Li, Mahmood Fathy, Guoying Zhao

This paper presents a novel method for remote heart rate (HR) estimation. Recent studies have proved that blood pumping by the heart is highly correlated to the intense color of face pixels, and surprisingly can be utilized for remote HR estimation. Researchers successfully proposed several methods for this task, but making it work in realistic situations is still a challenging problem in computer vision community. Furthermore, learning to solve such a complex task on a dataset with very limited annotated samples is not reasonable. Consequently, researchers do not prefer to use the deep learning approaches for this problem. In this paper, we propose a simple yet efficient approach to benefit the advantages of the Deep Neural Network (DNN) by simplifying HR estimation from a complex task to learning from very correlated representation to HR. Inspired by previous work, we learn a component called Front-End (FE) to provide a discriminative representation of face videos, afterward a light deep regression auto-encoder as Back-End (BE) is learned to map the FE representation to HR. Regression task on the informative representation is simple and could be learned efficiently on limited training samples. Beside of this, to be more accurate and work well on low-quality videos, two deep encoder-decoder networks are trained to refine the output of FE. We also introduce a challenging dataset (HR-D) to show that our method can efficiently work in realistic conditions. Experimental results on HR-D and MAHNOB datasets confirm that our method could run as a real-time method and estimate the average HR better than state-of-the-art ones.

  Access Model/Code and Paper
Remote Heart Rate Measurement from Highly Compressed Facial Videos: an End-to-end Deep Learning Solution with Video Enhancement

Jul 27, 2019
Zitong Yu, Wei Peng, Xiaobai Li, Xiaopeng Hong, Guoying Zhao

Remote photoplethysmography (rPPG), which aims at measuring heart activities without any contact, has great potential in many applications (e.g., remote healthcare). Existing rPPG approaches rely on analyzing very fine details of facial videos, which are prone to be affected by video compression. Here we propose a two-stage, end-to-end method using hidden rPPG information enhancement and attention networks, which is the first attempt to counter video compression loss and recover rPPG signals from highly compressed videos. The method includes two parts: 1) a Spatio-Temporal Video Enhancement Network (STVEN) for video enhancement, and 2) an rPPG network (rPPGNet) for rPPG signal recovery. The rPPGNet can work on its own for robust rPPG measurement, and the STVEN network can be added and jointly trained to further boost the performance especially on highly compressed videos. Comprehensive experiments are performed on two benchmark datasets to show that, 1) the proposed method not only achieves superior performance on compressed videos with high-quality videos pair, 2) it also generalizes well on novel data with only compressed videos available, which implies the promising potential for real world applications.

* IEEE ICCV2019, accepted 

  Access Model/Code and Paper
Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-expressions

Jan 15, 2019
Zhaoqiang Xia, Xiaopeng Hong, Xingyu Gao, Xiaoyi Feng, Guoying Zhao

Recently, the recognition task of spontaneous facial micro-expressions has attracted much attention with its various real-world applications. Plenty of handcrafted or learned features have been employed for a variety of classifiers and achieved promising performances for recognizing micro-expressions. However, the micro-expression recognition is still challenging due to the subtle spatiotemporal changes of micro-expressions. To exploit the merits of deep learning, we propose a novel deep recurrent convolutional networks based micro-expression recognition approach, capturing the spatial-temporal deformations of micro-expression sequence. Specifically, the proposed deep model is constituted of several recurrent convolutional layers for extracting visual features and a classificatory layer for recognition. It is optimized by an end-to-end manner and obviates manual feature design. To handle sequential data, we exploit two types of extending the connectivity of convolutional networks across temporal domain, in which the spatiotemporal deformations are modeled in views of facial appearance and geometry separately. Besides, to overcome the shortcomings of limited and imbalanced training samples, temporal data augmentation strategies as well as a balanced loss are jointly used for our deep network. By performing the experiments on three spontaneous micro-expression datasets, we verify the effectiveness of our proposed micro-expression recognition approach compared to the state-of-the-art methods.

* Submitted to IEEE TMM 

  Access Model/Code and Paper
A Global Alignment Kernel based Approach for Group-level Happiness Intensity Estimation

Sep 03, 2018
Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, Guoying Zhao

With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment kernels (GAKs) to estimate the happiness intensity of a group of people. We first exploit Riesz-based volume local binary pattern (RVLBP) and deep convolutional neural network (CNN) based features for characterizing facial images. Furthermore, we propose to use the GAK for RVLBP and deep CNN features, respectively for explicitly measuring the similarity of two group-level images. Specifically, we exploit the global weight sort scheme to sort the face images from group-level image according to their spatial weights, making an efficient data structure to GAK. Lastly, we propose Multiple kernel learning based on three combination strategies for combining two respective GAKs based on RVLBP and deep CNN features, such that enhancing the discriminative ability of each GAK. Intensive experiments are performed on the challenging group-level happiness intensity database, namely HAPPEI. Our experimental results demonstrate that the proposed approach achieves promising performance for group happiness intensity analysis, when compared with the recent state-of-the-art methods.

  Access Model/Code and Paper
SRN: Side-output Residual Network for Object Reflection Symmetry Detection and Beyond

Jul 17, 2018
Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

In this paper, we establish a baseline for object reflection symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The end-to-end deep learning approach, referred to as a side-output residual network (SRN), leverages the output residual units (RUs) to fit the errors between the object ground-truth symmetry and the side-outputs of multiple stages. By cascading RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple stages to address the challenges of fitting complex output with limited convolutional layers, suppressing the complex backgrounds, and effectively matching object symmetry at different scales. SRN is further upgraded to a multi-task side-output residual network (MT-SRN) for joint symmetry and edge detection, demonstrating its generality to image-to-mask learning tasks. Experimental results validate both the challenging aspects of Sym-PASCAL benchmark related to real-world images and the state-of-the-art performance of the proposed SRN approach.

* submitted to PAMI, under major revision 

  Access Model/Code and Paper
Learning a Target Sample Re-Generator for Cross-Database Micro-Expression Recognition

Jul 26, 2017
Yuan Zong, Xiaohua Huang, Wenming Zheng, Zhen Cui, Guoying Zhao

In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.

* To appear at ACM Multimedia 2017 

  Access Model/Code and Paper
SRN: Side-output Residual Network for Object Symmetry Detection in the Wild

Apr 01, 2017
Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, Qixiang Ye

In this paper, we establish a baseline for object symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, named Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The proposed symmetry detection approach, named Side-output Residual Network (SRN), leverages output Residual Units (RUs) to fit the errors between the object symmetry groundtruth and the outputs of RUs. By stacking RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple scales to ease the problems of fitting complex outputs with limited layers, suppressing the complex backgrounds, and effectively matching object symmetry of different scales. Experimental results validate both the benchmark and its challenging aspects related to realworld images, and the state-of-the-art performance of our symmetry detection approach. The benchmark and the code for SRN are publicly available at

* Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 

  Access Model/Code and Paper