Research papers and code for "Guoqiang Zhong":
Other than vector representations, the direct objects of human cognition are generally high-order tensors, such as 2D images and 3D textures. From this fact, two interesting questions naturally arise: How does the human brain represent these tensor perceptions in a "manifold" way, and how can they be recognized on the "manifold"? In this paper, we present a supervised model to learn the intrinsic structure of the tensors embedded in a high dimensional Euclidean space. With the fixed point continuation procedures, our model automatically and jointly discovers the optimal dimensionality and the representations of the low dimensional embeddings. This makes it an effective simulation of the cognitive process of human brain. Furthermore, the generalization of our model based on similarity between the learned low dimensional embeddings can be viewed as counterpart of recognition of human brain. Experiments on applications for object recognition and face recognition demonstrate the superiority of our proposed model over state-of-the-art approaches.

* 30 pages
Click to Read Paper and Get Code
Recently, with convolutional neural networks gaining significant achievements in many challenging machine learning fields, hand-crafted neural networks no longer satisfy our requirements as designing a network will cost a lot, and automatically generating architectures has attracted increasingly more attention and focus. Some research on auto-generated networks has achieved promising results. However, they mainly aim at picking a series of single layers such as convolution or pooling layers one by one. There are many elegant and creative designs in the carefully hand-crafted neural networks, such as Inception-block in GoogLeNet, residual block in residual network and dense block in dense convolutional network. Based on reinforcement learning and taking advantages of the superiority of these networks, we propose a novel automatic process to design a multi-block neural network, whose architecture contains multiple types of blocks mentioned above, with the purpose to do structure learning of deep neural networks and explore the possibility whether different blocks can be composed together to form a well-behaved neural network. The optimal network is created by the Q-learning agent who is trained to sequentially pick different types of blocks. To verify the validity of our proposed method, we use the auto-generated multi-block neural network to conduct experiments on image benchmark datasets MNIST, SVHN and CIFAR-10 image classification task with restricted computational resources. The results demonstrate that our method is very effective, achieving comparable or better performance than hand-crafted networks and advanced auto-generated neural networks.

Click to Read Paper and Get Code
Recurrent Neural Network (RNN) has been successfully applied in many sequence learning problems. Such as handwriting recognition, image description, natural language processing and video motion analysis. After years of development, researchers have improved the internal structure of the RNN and introduced many variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely used RNN model. However, GRU lacks the capability of adaptively paying attention to certain regions or locations, so that it may cause information redundancy or loss during leaning. In this paper, we propose a RNN model, called Recurrent Attention Unit (RAU), which seamlessly integrates the attention mechanism into the interior of GRU by adding an attention gate. The attention gate can enhance GRU's ability to remember long-term memory and help memory cells quickly discard unimportant content. RAU is capable of extracting information from the sequential data by adaptively selecting a sequence of regions or locations and pay more attention to the selected regions during learning. Extensive experiments on image classification, sentiment classification and language modeling show that RAU consistently outperforms GRU and other baseline methods.

Click to Read Paper and Get Code
In order to learn effective features from temporal sequences, the long short-term memory (LSTM) network is widely applied. A critical component of LSTM is the memory cell, which is able to extract, process and store temporal information. Nevertheless, in LSTM, the memory cell is not directly enforced to pay attention to a part of the sequence. Alternatively, the attention mechanism can help to pay attention to specific information of data. In this paper, we present a novel neural model, called long short-term attention (LSTA), which seamlessly merges the attention mechanism into LSTM. More than processing long short term sequences, it can distill effective and valuable information from the sequences with the attention mechanism. Experiments show that LSTA achieves promising learning performance in various deep learning tasks.

Click to Read Paper and Get Code
Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.

* About 20 pages. Submitted to Journal of Finance and Data Science as an invited paper
Click to Read Paper and Get Code
Convolutional Neural Network (CNN) has gained state-of-the-art results in many pattern recognition and computer vision tasks. However, most of the CNN structures are manually designed by experienced researchers. Therefore, auto- matically building high performance networks becomes an important problem. In this paper, we introduce the idea of using DNA computing algorithm to automatically learn high-performance architectures. In DNA computing algorithm, we use short DNA strands to represent layers and long DNA strands to represent overall networks. We found that most of the learned models perform similarly, and only those performing worse during the first runs of training will perform worse finally than others. The indicates that: 1) Using DNA computing algorithm to learn deep architectures is feasible; 2) Local minima should not be a problem of deep networks; 3) We can use early stop to kill the models with the bad performance just after several runs of training. In our experiments, an accuracy 99.73% was obtained on the MNIST data set and an accuracy 95.10% was obtained on the CIFAR-10 data set.

Click to Read Paper and Get Code
In recent years, research on image generation methods has been developing fast. The auto-encoding variational Bayes method (VAEs) was proposed in 2013, which uses variational inference to learn a latent space from the image database and then generates images using the decoder. The generative adversarial networks (GANs) came out as a promising framework, which uses adversarial training to improve the generative ability of the generator. However, the generated pictures by GANs are generally blurry. The deep convolutional generative adversarial networks (DCGANs) were then proposed to leverage the quality of generated images. Since the input noise vectors are randomly sampled from a Gaussian distribution, the generator has to map from a whole normal distribution to the images. This makes DCGANs unable to reflect the inherent structure of the training data. In this paper, we propose a novel deep model, called generative adversarial networks with decoder-encoder output noise (DE-GANs), which takes advantage of both the adversarial training and the variational Bayesain inference to improve the performance of image generation. DE-GANs use a pre-trained decoder-encoder architecture to map the random Gaussian noise vectors to informative ones and pass them to the generator of the adversarial networks. Since the decoder-encoder architecture is trained by the same images as the generators, the output vectors could carry the intrinsic distribution information of the original images. Moreover, the loss function of DE-GANs is different from GANs and DCGANs. A hidden-space loss function is added to the adversarial loss function to enhance the robustness of the model. Extensive empirical results show that DE-GANs can accelerate the convergence of the adversarial training process and improve the quality of the generated images.

Click to Read Paper and Get Code
Random forests are a type of ensemble method which makes predictions by combining the results of several independent trees. However, the theory of random forests has long been outpaced by their application. In this paper, we propose a novel random forests algorithm based on cooperative game theory. Banzhaf power index is employed to evaluate the power of each feature by traversing possible feature coalitions. Unlike the previously used information gain rate of information theory, which simply chooses the most informative feature, the Banzhaf power index can be considered as a metric of the importance of each feature on the dependency among a group of features. More importantly, we have proved the consistency of the proposed algorithm, named Banzhaf random forests (BRF). This theoretical analysis takes a step towards narrowing the gap between the theory and practice of random forests for classification problems. Experiments on several UCI benchmark data sets show that BRF is competitive with state-of-the-art classifiers and dramatically outperforms previous consistent random forests. Particularly, it is much more efficient than previous consistent random forests.

* arXiv admin note: text overlap with arXiv:1302.4853 by other authors
Click to Read Paper and Get Code
Hashing-based methods seek compact and efficient binary codes that preserve the neighborhood structure in the original data space. For most existing hashing methods, an image is first encoded as a vector of hand-crafted visual feature, followed by a hash projection and quantization step to get the compact binary vector. Most of the hand-crafted features just encode the low-level information of the input, the feature may not preserve the semantic similarities of images pairs. Meanwhile, the hashing function learning process is independent with the feature representation, so the feature may not be optimal for the hashing projection. In this paper, we propose a supervised hashing method based on a well designed deep convolutional neural network, which tries to learn hashing code and compact representations of data simultaneously. The proposed model learn the binary codes by adding a compact sigmoid layer before the loss layer. Experiments on several image data sets show that the proposed model outperforms other state-of-the-art methods.

* 7 pages, 5 figures
Click to Read Paper and Get Code
In this paper, we propose a novel unsupervised deep learning model, called PCA-based Convolutional Network (PCN). The architecture of PCN is composed of several feature extraction stages and a nonlinear output stage. Particularly, each feature extraction stage includes two layers: a convolutional layer and a feature pooling layer. In the convolutional layer, the filter banks are simply learned by PCA. In the nonlinear output stage, binary hashing is applied. For the higher convolutional layers, the filter banks are learned from the feature maps that were obtained in the previous stage. To test PCN, we conducted extensive experiments on some challenging tasks, including handwritten digits recognition, face recognition and texture classification. The results show that PCN performs competitive with or even better than state-of-the-art deep learning models. More importantly, since there is no back propagation for supervised finetuning, PCN is much more efficient than existing deep networks.

* 8 pages,5 figures
Click to Read Paper and Get Code
This letter adopts long short-term memory(LSTM) to predict sea surface temperature(SST), which is the first attempt, to our knowledge, to use recurrent neural network to solve the problem of SST prediction, and to make one week and one month daily prediction. We formulate the SST prediction problem as a time series regression problem. LSTM is a special kind of recurrent neural network, which introduces gate mechanism into vanilla RNN to prevent the vanished or exploding gradient problem. It has strong ability to model the temporal relationship of time series data and can handle the long-term dependency problem well. The proposed network architecture is composed of two kinds of layers: LSTM layer and full-connected dense layer. LSTM layer is utilized to model the time series relationship. Full-connected layer is utilized to map the output of LSTM layer to a final prediction. We explore the optimal setting of this architecture by experiments and report the accuracy of coastal seas of China to confirm the effectiveness of the proposed method. In addition, we also show its online updated characteristics.

* 5 page, 5 figures
Click to Read Paper and Get Code
Generative Adversarial Networks (GANs) have a great performance in image generation, but they need a large scale of data to train the entire framework, and often result in nonsensical results. We propose a new method referring to conditional GAN, which equipments the latent noise with mixture of Student's t-distribution with attention mechanism in addition to class information. Student's t-distribution has long tails that can provide more diversity to the latent noise. Meanwhile, the discriminator in our model implements two tasks simultaneously, judging whether the images come from the true data distribution, and identifying the class of each generated images. The parameters of the mixture model can be learned along with those of GANs. Moreover, we mathematically prove that any multivariate Student's t-distribution can be obtained by a linear transformation of a normal multivariate Student's t-distribution. Experiments comparing the proposed method with typical GAN, DeliGAN and DCGAN indicate that, our method has a great performance on generating diverse and legible objects with limited data.

Click to Read Paper and Get Code
This paper investigates a novel task of generating texture images from perceptual descriptions. Previous work on texture generation focused on either synthesis from examples or generation from procedural models. Generating textures from perceptual attributes have not been well studied yet. Meanwhile, perceptual attributes, such as directionality, regularity and roughness are important factors for human observers to describe a texture. In this paper, we propose a joint deep network model that combines adversarial training and perceptual feature regression for texture generation, while only random noise and user-defined perceptual attributes are required as input. In this model, a preliminary trained convolutional neural network is essentially integrated with the adversarial framework, which can drive the generated textures to possess given perceptual attributes. An important aspect of the proposed model is that, if we change one of the input perceptual features, the corresponding appearance of the generated textures will also be changed. We design several experiments to validate the effectiveness of the proposed method. The results show that the proposed method can produce high quality texture images with desired perceptual properties.

* 7 pages, 4 figures, icme2017
Click to Read Paper and Get Code