Models, code, and papers for "Wenming Yang":

Recurrent Regression for Face Recognition

Jul 24, 2016
Yang Li, Wenming Zheng, Zhen Cui

To address the sequential changes of images including poses, in this paper we propose a recurrent regression neural network(RRNN) framework to unify two classic tasks of cross-pose face recognition on still images and video-based face recognition. To imitate the changes of images, we explicitly construct the potential dependencies of sequential images so as to regularize the final learning model. By performing progressive transforms for sequentially adjacent images, RRNN can adaptively memorize and forget the information that benefits for the final classification. For face recognition of still images, given any one image with any one pose, we recurrently predict the images with its sequential poses to expect to capture some useful information of others poses. For video-based face recognition, the recurrent regression takes one entire sequence rather than one image as its input. We verify RRNN in static face dataset MultiPIE and face video dataset YouTube Celebrities(YTC). The comprehensive experimental results demonstrate the effectiveness of the proposed RRNN method.


  Click for Model/Code and Paper
CFSNet: Toward a Controllable Feature Space for Image Restoration

Apr 01, 2019
Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang

Deep learning methods have witnessed the great progress in image restoration with specific metrics (e.g., PSNR, SSIM). However, the perceptual quality of the restored image is relatively subjective, and it is necessary for users to control the reconstruction result according to personal preferences or image characteristics, which cannot be done using existing deterministic networks. This motivates us to exquisitely design a unified interactive framework for general image restoration tasks. Under this framework, users can control continuous transition of different objectives, e.g., the perception-distortion trade-off of image super-resolution, the trade-off between noise reduction and detail preservation. We achieve this goal by controlling latent features of the designed network. To be specific, our proposed framework, named Controllable Feature Space Network (CFSNet), is entangled by two branches based on different objectives. Our model can adaptively learn the coupling coefficients of different layers and channels, which provides finer control of the restored image quality. Experiments on several typical image restoration tasks fully validate the effective benefits of the proposed method.


  Click for Model/Code and Paper
Context-Dependent Diffusion Network for Visual Relationship Detection

Sep 11, 2018
Zhen Cui, Chunyan Xu, Wenming Zheng, Jian Yang

Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, the relation triplets of subject-predicate-object lie on an extreme diversity space, such as \textit{person-behind-person} and \textit{car-behind-building}, while suffering from the problem of combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual relationship detection. To capture the interactions of different object instances, two types of graphs, word semantic graph and visual scene graph, are constructed to encode global context interdependency. The semantic graph is built through language priors to model semantic correlations across objects, whilst the visual scene graph defines the connections of scene objects so as to utilize the surrounding scene information. For the graph-structured data, we design a diffusion network to adaptively aggregate information from contexts, which can effectively learn latent representations of visual relationships and well cater to visual relationship detection in view of its isomorphic invariance to graphs. Experiments on two widely-used datasets demonstrate that our proposed method is more effective and achieves the state-of-the-art performance.

* 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18) 

  Click for Model/Code and Paper
Tensor graph convolutional neural network

Mar 27, 2018
Tong Zhang, Wenming Zheng, Zhen Cui, Yang Li

In this paper, we propose a novel tensor graph convolutional neural network (TGCNN) to conduct convolution on factorizable graphs, for which here two types of problems are focused, one is sequential dynamic graphs and the other is cross-attribute graphs. Especially, we propose a graph preserving layer to memorize salient nodes of those factorized subgraphs, i.e. cross graph convolution and graph pooling. For cross graph convolution, a parameterized Kronecker sum operation is proposed to generate a conjunctive adjacency matrix characterizing the relationship between every pair of nodes across two subgraphs. Taking this operation, then general graph convolution may be efficiently performed followed by the composition of small matrices, which thus reduces high memory and computational burden. Encapsuling sequence graphs into a recursive learning, the dynamics of graphs can be efficiently encoded as well as the spatial layout of graphs. To validate the proposed TGCNN, experiments are conducted on skeleton action datasets as well as matrix completion dataset. The experiment results demonstrate that our method can achieve more competitive performance with the state-of-the-art methods.


  Click for Model/Code and Paper
Lightweight Feature Fusion Network for Single Image Super-Resolution

Apr 13, 2019
Wenming Yang, Wei Wang, Xuechen Zhang, Shuifa Sun, Qingmin Liao

Single image super-resolution(SISR) has witnessed great progress as convolutional neural network(CNN) gets deeper and wider. However, enormous parameters hinder its application to real world problems. In this letter, We propose a lightweight feature fusion network (LFFN) that can fully explore multi-scale contextual information and greatly reduce network parameters while maximizing SISR results. LFFN is built on spindle blocks and a softmax feature fusion module (SFFM). Specifically, a spindle block is composed of a dimension extension unit, a feature exploration unit and a feature refinement unit. The dimension extension layer expands low dimension to high dimension and implicitly learns the feature maps which is suitable for the next unit. The feature exploration unit performs linear and nonlinear feature exploration aimed at different feature maps. The feature refinement layer is used to fuse and refine features. SFFM fuses the features from different modules in a self-adaptive learning manner with softmax function, making full use of hierarchical information with a small amount of parameter cost. Both qualitative and quantitative experiments on benchmark datasets show that LFFN achieves favorable performance against state-of-the-art methods with similar parameters.

* Accepted by IEEE Signal Processing Letters (Volume:26, Issue:4, April 2019) 

  Click for Model/Code and Paper
Metabolize Neural Network

Sep 04, 2018
Dan Dai, Zhiwen Yu, Yang Hu, Wenming Cao, Mingnan Luo

The metabolism of cells is the most basic and important part of human function. Neural networks in deep learning stem from neuronal activity. It is self-evident that the significance of metabolize neuronal network(MetaNet) in model construction. In this study, we explore neuronal metabolism for shallow network from proliferation and autophagy two aspects. First, we propose different neuron proliferate methods that constructive the selfgrowing network in metabolism cycle. Proliferate neurons alleviate resources wasting and insufficient model learning problem when network initializes more or less parameters. Then combined with autophagy mechanism in the process of model self construction to ablate under-expressed neurons. The MetaNet can automatically determine the number of neurons during training, further, save more resource consumption. We verify the performance of the proposed methods on datasets: MNIST, Fashion-MNIST and CIFAR-10.


  Click for Model/Code and Paper
Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition

Feb 27, 2018
Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, Jian Yang

Variations of human body skeletons may be considered as dynamic graphs, which are generic data representation for numerous real-world applications. In this paper, we propose a spatio-temporal graph convolution (STGC) approach for assembling the successes of local convolutional filtering and sequence learning ability of autoregressive moving average. To encode dynamic graphs, the constructed multi-scale local graph convolution filters, consisting of matrices of local receptive fields and signal mappings, are recursively performed on structured graph data of temporal and spatial domain. The proposed model is generic and principled as it can be generalized into other dynamic models. We theoretically prove the stability of STGC and provide an upper-bound of the signal transformation to be learnt. Further, the proposed recursive model can be stacked into a multi-layer architecture. To evaluate our model, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D. The experimental results demonstrate the effectiveness of our proposed model and the improvement over the state-of-the-art.

* Accepted by AAAI 2018 

  Click for Model/Code and Paper
Spatial-Temporal Recurrent Neural Network for Emotion Recognition

May 12, 2017
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Yang Li

Emotion analysis is a crucial problem to endow artifact machines with real intelligence in many large potential applications. As external appearances of human emotions, electroencephalogram (EEG) signals and video face signals are widely used to track and analyze human's affective information. According to their common characteristics of spatial-temporal volumes, in this paper we propose a novel deep learning framework named spatial-temporal recurrent neural network (STRNN) to unify the learning of two different signal sources into a spatial-temporal dependency model. In STRNN, to capture those spatially cooccurrent variations of human emotions, a multi-directional recurrent neural network (RNN) layer is employed to capture longrange contextual cues by traversing the spatial region of each time slice from multiple angles. Then a bi-directional temporal RNN layer is further used to learn discriminative temporal dependencies from the sequences concatenating spatial features of each time slice produced from the spatial RNN layer. To further select those salient regions of emotion representation, we impose sparse projection onto those hidden states of spatial and temporal domains, which actually also increases the model discriminant ability because of this global consideration. Consequently, such a two-layer RNN model builds spatial dependencies as well as temporal dependencies of the input signals. Experimental results on the public emotion datasets of EEG and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.


  Click for Model/Code and Paper
Deep Learning for Single Image Super-Resolution: A Brief Review

Aug 09, 2018
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue

Single image super-resolution (SISR) is a notoriously challenging ill-posed problem, which aims to obtain a high- resolution (HR) output from one of its low-resolution (LR) versions. To solve the SISR problem, recently powerful deep learning algorithms have been employed and achieved the state- of-the-art performance. In this survey, we review representative deep learning-based SISR methods, and group them into two categories according to their major contributions to two essential aspects of SISR: the exploration of efficient neural network archi- tectures for SISR, and the development of effective optimization objectives for deep SISR learning. For each category, a baseline is firstly established and several critical limitations of the baseline are summarized. Then representative works on overcoming these limitations are presented based on their original contents as well as our critical understandings and analyses, and relevant comparisons are conducted from a variety of perspectives. Finally we conclude this review with some vital current challenges and future trends in SISR leveraging deep learning algorithms.


  Click for Model/Code and Paper
Inner-Imaging Convolutional Networks

Apr 22, 2019
Yang Hu, Guihua Wen, Mingnan Luo, Dan Dai, Wenming Cao, Zhiwen Yu

Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation cost and redundancies. Although previous works address this issue by enhancing diversities of filters, they ignore that both complementarity and completeness are required in the internal structure of convolutional network. In this setting, we propose a novel Inner-Imaging architecture, which allows relationships between channels to meet the above requirement. Specifically, we organize the filter signal points in groups using convolutional kernels to model both the intra- and inter-group relationships simultaneously. Consequently, we not only increase diversities of channels but also explicitly enhance the complementarity and completeness. Our proposed architecture is lightweight and easy to be implemented for improving the modelling efficiency and performance. We conduct extensive experiments on CIFAR, SVHN and ImageNet and verify the effectiveness of our inner-imaging architecture with residual networks as the backbone.

* 7 pages, 6 figures 

  Click for Model/Code and Paper
Action-Attending Graphic Neural Network

Nov 17, 2017
Chaolong Li, Zhen Cui, Wenming Zheng, Chunyan Xu, Rongrong Ji, Jian Yang

The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A$^2$GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A$^2$GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.


  Click for Model/Code and Paper
LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution

Sep 09, 2019
Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao

In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution. Compared with two representative network architectures with skip connections, ResNet and DenseNet, a linear compressing layer is designed in LCSCNet for skip connection, which connects former feature maps and distinguishes them from newly-explored feature maps. In this way, the proposed LCSCNet enjoys the merits of the distinguish feature treatment of DenseNet and the parameter-economic form of ResNet. Moreover, to better exploit hierarchical information from both low and high levels of various receptive fields in deep models, inspired by gate units in LSTM, we also propose an adaptive element-wise fusion strategy with multi-supervised training. Experimental results in comparison with state-of-the-art algorithms validate the effectiveness of LCSCNet.

* Accepted by IEEE Transactions on Image Processing (IEEE-TIP) 

  Click for Model/Code and Paper
Breast Tumor Classification Based on Decision Information Genes and Inverse Projection Sparse Representation

Apr 17, 2018
Xiaohui Yang, Wenming Wu, Yunmei Chen, Xianqi Li, Juan Zhang, Dan Long, Lijun Yang

Microarray gene expression data-based breast tumor classification is an active and challenging issue. In this paper, a robust framework of breast tumor recognition is presented aiming at reducing clinical misdiagnosis rate and exploiting available information in existing samples. A wrapper gene selection method is established from a new perspective of reducing clinical misdiagnosis rate. The further feature selection of information genes is achieved using the modified NMF model, which is rooted in the use of hierarchical learning and layer-wise pre-training strategy in deep learning. For completing the classification, an inverse projection sparse representation (IPSR) model is constructed to exploit information embedded in existing samples, especially in the test ones. Moreover, the IPSR model is optimized through generalized ADMM and the corresponding convergence is analyzed. Extensive experiments on public microarray gene expression datasets show that the proposed method is stable and effective for breast tumor classification. Compared to the latest open literature, there is 14% higher in classification accuracy. Specificity and sensitivity achieve 94.17% and 97.5%, respectively.

* 14 pages, 17 figures, 7 tables 

  Click for Model/Code and Paper
Low Rank Variation Dictionary and Inverse Projection Group Sparse Representation Model for Breast Tumor Classification

Mar 10, 2018
Xiaohui Yang, Xiaoying Jiang, Wenming Wu, Juan Zhang, Dan Long, Funa Zhou, Yiming Xu

Sparse representation classification achieves good results by addressing recognition problem with sufficient training samples per subject. However, SRC performs not very well for small sample data. In this paper, an inverse-projection group sparse representation model is presented for breast tumor classification, which is based on constructing low-rank variation dictionary. The proposed low-rank variation dictionary tackles tumor recognition problem from the viewpoint of detecting and using variations in gene expression profiles of normal and patients, rather than directly using these samples. The inverse projection group sparsity representation model is constructed based on taking full using of exist samples and group effect of microarray gene data. Extensive experiments on public breast tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods. The results of Breast-1, Breast-2 and Breast-3 databases are 80.81%, 89.10% and 100% respectively, which are better than the latest literature.

* 31 pages, 14 figures, 12 tables. arXiv admin note: text overlap with arXiv:1803.03562 

  Click for Model/Code and Paper
Extreme Network Compression via Filter Group Approximation

Jul 31, 2018
Bo Peng, Wenming Tan, Zheyang Li, Shun Zhang, Di Xie, Shiliang Pu

In this paper we propose a novel decomposition method based on filter group approximation, which can significantly reduce the redundancy of deep convolutional neural networks (CNNs) while maintaining the majority of feature representation. Unlike other low-rank decomposition algorithms which operate on spatial or channel dimension of filters, our proposed method mainly focuses on exploiting the filter group structure for each layer. For several commonly used CNN models, including VGG and ResNet, our method can reduce over 80% floating-point operations (FLOPs) with less accuracy drop than state-of-the-art methods on various image classification datasets. Besides, experiments demonstrate that our method is conducive to alleviating degeneracy of the compressed network, which hurts the convergence and performance of the network.

* Accepted by ECCV2018 

  Click for Model/Code and Paper