Models, code, and papers for "Qi Zhang":

Self-driving scale car trained by Deep reinforcement Learning

Sep 08, 2019
Qi Zhang, Tao Du

This paper considers the problem of self-driving algorithm based on deep learning. This is a hot topic because self-driving is the most important application field of artificial intelligence. Existing work focused on deep learning which has the ability to learn end-to-end self-driving control directly from raw sensory data, but this method is just a mapping between images and driving. We prefer deep reinforcement learning to train a self-driving car in a virtual simulation environment created by Unity and then migrate to reality. Deep reinforcement learning makes the machine own the driving descision-making ability like human. The virtual to realistic training method can efficiently handle the problem that reinforcement learning requires reward from the environment which probably cause cars damge. We have derived a theoretical model and analysis on how to use Deep Q-learning to control a car to drive. We have carried out simulations in the Unity virtual environment for evaluating the performance. Finally, we successfully migrate te model to the real world and realize self-driving.

  Click for Model/Code and Paper
Address Instance-level Label Prediction in Multiple Instance Learning

May 29, 2019
Minlong Peng, Qi Zhang

\textit{Multiple Instance Learning} (MIL) is concerned with learning from bags of instances, where only bag labels are given and instance labels are unknown. Existent approaches in this field were mainly designed for the bag-level label prediction (predict labels for bags) but not the instance-level (predict labels for instances), with the task loss being only defined at the bag level. This restricts their application in many tasks, where the instance-level labels are more interested. In this paper, we propose a novel algorithm, whose loss is specifically defined at the instance level, to address instance-level label prediction in MIL. We prove that the loss of this algorithm can be unbiasedly and consistently estimated without using instance labels, under the i.i.d assumption. Empirical study validates the above statements and shows that the proposed algorithm can achieve superior instance-level and comparative bag-level performance, compared to state-of-the-art MIL methods. In addition, it shows that the proposed method can achieve similar results as the fully supervised model (trained with instance labels) for label prediction at the instance level.

* This work address Multiple Instance Learning 

  Click for Model/Code and Paper
A Regressive Convolution Neural network and Support Vector Regression Model for Electricity Consumption Forecasting

Oct 21, 2018
Youshan Zhang, Qi Li

Electricity consumption forecasting has important implications for the mineral companies on guiding quarterly work, normal power system operation, and the management. However, electricity consumption prediction for the mineral company is different from traditional electricity load prediction since mineral company electricity consumption can be affected by various factors (e.g., ore grade, processing quantity of the crude ore, ball milling fill rate). The problem is non-trivial due to three major challenges for traditional methods: insufficient training data, high computational cost and low prediction accu-racy. To tackle these challenges, we firstly propose a Regressive Convolution Neural Network (RCNN) to predict the electricity consumption. While RCNN still suffers from high computation overhead, we utilize RCNN to extract features from the history data and Regressive Support Vector Machine (SVR) trained with the features to predict the electricity consumption. The experimental results show that the proposed RCNN-SVR model achieves higher accuracy than using the traditional RNN or SVM alone. The MSE, MAPE, and CV-RMSE of RCNN-SVR model are 0.8564, 1.975%, and 0.0687% respectively, which illustrates the low predicting error rate of the proposed model.

* Future of Information and Communications Conference (FICC) 2019 

  Click for Model/Code and Paper
Face Recognition via Centralized Coordinate Learning

Jan 17, 2018
Xianbiao Qi, Lei Zhang

Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years. During the training process of DNN, the face features and classification vectors to be learned will interact with each other, while the distribution of face features will largely affect the convergence status of network and the face similarity computing in test stage. In this work, we formulate jointly the learning of face features and classification vectors, and propose a simple yet effective centralized coordinate learning (CCL) method, which enforces the features to be dispersedly spanned in the coordinate space while ensuring the classification vectors to lie on a hypersphere. An adaptive angular margin is further proposed to enhance the discrimination capability of face features. Extensive experiments are conducted on six face benchmarks, including those have large age gap and hard negative samples. Trained only on the small-scale CASIA Webface dataset with 460K face images from about 10K subjects, our CCL model demonstrates high effectiveness and generality, showing consistently competitive performance across all the six benchmark databases.

* 14 pages, 9 figures 

  Click for Model/Code and Paper
Stochastic trajectory prediction with social graph network

Jul 24, 2019
Lidan Zhang, Qi She, Ping Guo

Pedestrian trajectory prediction is a challenging task because of the complexity of real-world human social behaviors and uncertainty of the future motion. For the first issue, existing methods adopt fully connected topology for modeling the social behaviors, while ignoring non-symmetric pairwise relationships. To effectively capture social behaviors of relevant pedestrians, we utilize a directed social graph which is dynamically constructed on timely location and speed direction. Based on the social graph, we further propose a network to collect social effects and accumulate with individual representation, in order to generate destination-oriented and social-aware representations. For the second issue, instead of modeling the uncertainty of the entire future as a whole, we utilize a temporal stochastic method for sequentially learning a prior model of uncertainty during social interactions. The prediction on the next step is then generated by sampling on the prior model and progressively decoding with a hierarchical LSTMs. Experimental results on two public datasets show the effectiveness of our method, especially when predicting trajectories in very crowded scenes.

* 10 pages, 5 figures 

  Click for Model/Code and Paper
Weighed Domain-Invariant Representation Learning for Cross-domain Sentiment Analysis

Sep 18, 2019
Minlong Peng, Qi Zhang, Xuanjing Huang

Cross-domain sentiment analysis is currently a hot topic in the research and engineering areas. One of the most popular frameworks in this field is the domain-invariant representation learning (DIRL) paradigm, which aims to learn a distribution-invariant feature representation across domains. However, in this work, we find out that applying DIRL may harm domain adaptation when the label distribution $\rm{P}(\rm{Y})$ changes across domains. To address this problem, we propose a modification to DIRL, obtaining a novel weighted domain-invariant representation learning (WDIRL) framework. We show that it is easy to transfer existing SOTA DIRL models to WDIRL. Empirical studies on extensive cross-domain sentiment analysis tasks verified our statements and showed the effectiveness of our proposed solution.

* Address the problem of the domain-invariant representation learning framework under target shift 

  Click for Model/Code and Paper
Dynamic Malware Analysis with Feature Engineering and Feature Learning

Sep 09, 2019
Zhaoqi Zhang, Panpan Qi, Wei Wang

Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g., system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from ablation study.

  Click for Model/Code and Paper
Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

Sep 06, 2019
Sai Qian Zhang, Qi Zhang, Jieyu Lin

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications. However, achieving efficient communication among agents has always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL. By limiting the variance of the exchanged messages between agents during the training phase, the noisy component in the messages can be eliminated effectively, while the useful part can be preserved and utilized by the agents for better performance. Our evaluation using a challenging set of StarCraft II benchmarks indicates that our method achieves $2-10\times$ lower in communication overhead than state-of-the-art MARL algorithms, while allowing agents to better collaborate by developing sophisticated strategies.

  Click for Model/Code and Paper
Image Super-Resolution Using a Wavelet-based Generative Adversarial Network

Jul 24, 2019
Qi Zhang, Huafeng Wang, Sichen Yang

In this paper, we consider the problem of super-resolution recons-truction. This is a hot topic because super-resolution reconstruction has a wide range of applications in the medical field, remote sensing monitoring, and criminal investigation. Compared with traditional algorithms, the current super-resolution reconstruction algorithm based on deep learning greatly improves the clarity of reconstructed pictures. Existing work like Super-Resolution Using a Generative Adversarial Network (SRGAN) can effectively restore the texture details of the image. However, experimentally verified that the texture details of the image recovered by the SRGAN are not robust. In order to get super-resolution reconstructed images with richer high-frequency details, we improve the network structure and propose a super-resolution reconstruction algorithm combining wavelet transform and Generative Adversarial Network. The proposed algorithm can efficiently reconstruct high-resolution images with rich global information and local texture details. We have trained our model by PyTorch framework and VOC2012 dataset, and tested it by Set5, Set14, BSD100 and Urban100 test datasets.

  Click for Model/Code and Paper
Neural Learning of Online Consumer Credit Risk

Jun 05, 2019
Di Wang, Qi Wu, Wen Zhang

This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers' purchase. The "NeuCredit" model can capture both serial dependences in multi-dimensional time series data when event frequencies in each dimension differ. It also captures nonlinear cross-sectional interactions among different time-evolving features. Also, the predicted default probability is designed to be interpretable such that risks can be decomposed into three components: the subjective risk indicating the consumers' willingness to repay, the objective risk indicating their ability to repay, and the behavioral risk indicating consumers' behavioral differences. Using a unique dataset from one of the largest global e-commerce platforms, we show that the inclusion of shopping behavioral data, besides conventional payment records, requires a deep learning approach to extract the information content of these data, which turns out significantly enhancing forecasting performance than the traditional machine learning methods.

* 49 pages, 11 tables, 7 figures 

  Click for Model/Code and Paper
Optimal Clustering Framework for Hyperspectral Band Selection

Apr 30, 2019
Qi Wang, Fahong Zhang, Xuelong Li

Band selection, by choosing a set of representative bands in hyperspectral image (HSI), is an effective method to reduce the redundant information without compromising the original contents. Recently, various unsupervised band selection methods have been proposed, but most of them are based on approximation algorithms which can only obtain suboptimal solutions toward a specific objective function. This paper focuses on clustering-based band selection, and proposes a new framework to solve the above dilemma, claiming the following contributions: 1) An optimal clustering framework (OCF), which can obtain the optimal clustering result for a particular form of objective function under a reasonable constraint. 2) A rank on clusters strategy (RCS), which provides an effective criterion to select bands on existing clustering structure. 3) An automatic method to determine the number of the required bands, which can better evaluate the distinctive information produced by certain number of bands. In experiments, the proposed algorithm is compared to some state-of-the-art competitors. According to the experimental results, the proposed algorithm is robust and significantly outperform the other methods on various data sets.

* IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 10, pp. 5910-5922, 2018 

  Click for Model/Code and Paper
CBHE: Corner-based Building Height Estimation for Complex Street Scene Images

Apr 25, 2019
Yunxiang Zhao, Jianzhong Qi, Rui Zhang

Building height estimation is important in many applications such as 3D city reconstruction, urban planning, and navigation. Recently, a new building height estimation method using street scene images and 2D maps was proposed. This method is more scalable than traditional methods that use high-resolution optical data, LiDAR data, or RADAR data which are expensive to obtain. The method needs to detect building rooflines and then compute building height via the pinhole camera model. We observe that this method has limitations in handling complex street scene images in which buildings overlap with each other and the rooflines are difficult to locate. We propose CBHE, a building height estimation algorithm considering both building corners and rooflines. CBHE first obtains building corner and roofline candidates in street scene images based on building footprints from 2D maps and the camera parameters. Then, we use a deep neural network named BuildingNet to classify and filter corner and roofline candidates. Based on the valid corners and rooflines from BuildingNet, CBHE computes building height via the pinhole camera model. Experimental results show that the proposed BuildingNet yields a higher accuracy on building corner and roofline candidate filtering compared with the state-of-the-art open set classifiers. Meanwhile, CBHE outperforms the baseline algorithm by over 10% in building height estimation accuracy.

  Click for Model/Code and Paper
Sentence-State LSTM for Text Representation

May 07, 2018
Yue Zhang, Qi Liu, Linfeng Song

Bi-directional LSTMs are a powerful tool for text representation. On the other hand, they have been shown to suffer various limitations due to their sequential nature. We investigate an alternative LSTM structure for encoding text, which consists of a parallel state for each word. Recurrent steps are used to perform local and global information exchange between words simultaneously, rather than incremental reading of a sequence of words. Results on various classification and sequence labelling benchmarks show that the proposed model has strong representation power, giving highly competitive performances compared to stacked BiLSTM models with similar parameter numbers.

* ACL 18 camera-ready version 

  Click for Model/Code and Paper
Joint Voxel and Coordinate Regression for Accurate 3D Facial Landmark Localization

Jan 28, 2018
Hongwen Zhang, Qi Li, Zhenan Sun

3D face shape is more expressive and viewpoint-consistent than its 2D counterpart. However, 3D facial landmark localization in a single image is challenging due to the ambiguous nature of landmarks under 3D perspective. Existing approaches typically adopt a suboptimal two-step strategy, performing 2D landmark localization followed by depth estimation. In this paper, we propose the Joint Voxel and Coordinate Regression (JVCR) method for 3D facial landmark localization, addressing it more effectively in an end-to-end fashion. First, a compact volumetric representation is proposed to encode the per-voxel likelihood of positions being the 3D landmarks. The dimensionality of such a representation is fixed regardless of the number of target landmarks, so that the curse of dimensionality could be avoided. Then, a stacked hourglass network is adopted to estimate the volumetric representation from coarse to fine, followed by a 3D convolution network that takes the estimated volume as input and regresses 3D coordinates of the face shape. In this way, the 3D structural constraints between landmarks could be learned by the neural network in a more efficient manner. Moreover, the proposed pipeline enables end-to-end training and improves the robustness and accuracy of 3D facial landmark localization. The effectiveness of our approach is validated on the 3DFAW and AFLW2000-3D datasets. Experimental results show that the proposed method achieves state-of-the-art performance in comparison with existing methods.

* Code available at 

  Click for Model/Code and Paper
Decoupled Learning for Conditional Adversarial Networks

Jan 21, 2018
Zhifei Zhang, Yang Song, Hairong Qi

Incorporating encoding-decoding nets with adversarial nets has been widely adopted in image generation tasks. We observe that the state-of-the-art achievements were obtained by carefully balancing the reconstruction loss and adversarial loss, and such balance shifts with different network structures, datasets, and training strategies. Empirical studies have demonstrated that an inappropriate weight between the two losses may cause instability, and it is tricky to search for the optimal setting, especially when lacking prior knowledge on the data and network. This paper gives the first attempt to relax the need of manual balancing by proposing the concept of \textit{decoupled learning}, where a novel network structure is designed that explicitly disentangles the backpropagation paths of the two losses. Experimental results demonstrate the effectiveness, robustness, and generality of the proposed method. The other contribution of the paper is the design of a new evaluation metric to measure the image quality of generative models. We propose the so-called \textit{normalized relative discriminative score} (NRDS), which introduces the idea of relative comparison, rather than providing absolute estimates like existing metrics.

  Click for Model/Code and Paper
r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches

Dec 06, 2017
Yang Song, Zhifei Zhang, Hairong Qi

We start by asking an interesting yet challenging question, "If an eyewitness can only recall the eye features of the suspect, such that the forensic artist can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig. 1), can advanced computer vision techniques help generate the whole face image?" A more generalized question is that if a large proportion (e.g., more than 50%) of the face/sketch is missing, can a realistic whole face sketch/image still be estimated. Existing face completion and generation methods either do not conduct domain transfer learning or can not handle large missing area. For example, the inpainting approach tends to blur the generated region when the missing area is large (i.e., more than 50%). In this paper, we exploit the potential of deep learning networks in filling large missing region (e.g., as high as 95% missing) and generating realistic faces with high-fidelity in cross domains. We propose the recursive generation by bidirectional transformation networks (r-BTN) that recursively generates a whole face/sketch from a small sketch/face patch. The large missing area and the cross domain challenge make it difficult to generate satisfactory results using a unidirectional cross-domain learning structure. On the other hand, a forward and backward bidirectional learning between the face and sketch domains would enable recursive estimation of the missing region in an incremental manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial constraint to encourage the generation of realistic faces/sketches. Extensive experiments have been conducted to demonstrate the superior performance from r-BTN as compared to existing potential solutions.

* Accepted by AAAI 2018 

  Click for Model/Code and Paper
Age Progression/Regression by Conditional Adversarial Autoencoder

Mar 28, 2017
Zhifei Zhang, Yang Song, Hairong Qi

"If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5?" The answer is probably a "No." Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

* Accepted by The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) 

  Click for Model/Code and Paper
Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making

Mar 14, 2017
Qi Zhang, Satinder Singh, Edmund Durfee

In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account. Extending previous work in the Bayesian setting, we consider instead a worst-case setting in which the agent has a set of possible environments (MDPs) it could be in, and develop a commitment semantics that allows for probabilistic guarantees on the agent's behavior in any of the environments it could end up facing. Crucially, an agent receives observations (of reward and state transitions) that allow it to potentially eliminate possible environments and thus obtain higher utility by adapting its policy to the history of observations. We develop algorithms and provide theory and some preliminary empirical results showing that they ensure an agent meets its commitments with history-dependent policies while minimizing maximum regret over the possible environments.

  Click for Model/Code and Paper
Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

Jan 10, 2017
Junbo Zhang, Yu Zheng, Dekang Qi

Forecasting the flow of crowds is of great importance to traffic management and public safety, yet a very challenging task affected by many complex factors, such as inter-region traffic, events and weather. In this paper, we propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the in-flow and out-flow of crowds in each and every region through a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the framework of the residual neural networks to model the temporal closeness, period, and trend properties of the crowd traffic, respectively. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of the crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. We evaluate ST-ResNet based on two types of crowd flows in Beijing and NYC, finding that its performance exceeds six well-know methods.

* AAAI 2017 

  Click for Model/Code and Paper