Research papers and code for "Pengfei Zhang":
We exam various geometric active contour methods for radar image segmentation. Due to special properties of radar images, we propose our new model based on modified Chan-Vese functional. Our method is efficient in separating non-meteorological noises from meteorological images.

Click to Read Paper and Get Code
We present a 3D Convolutional Neural Networks (CNNs) based single shot detector for spatial-temporal action detection tasks. Our model includes: (1) two short-term appearance and motion streams, with single RGB and optical flow image input separately, in order to capture the spatial and temporal information for the current frame; (2) two long-term 3D ConvNet based stream, working on sequences of continuous RGB and optical flow images to capture the context from past frames. Our model achieves strong performance for action detection in video and can be easily integrated into any current two-stream action detection methods. We report a frame-mAP of 71.30% on the challenging UCF101-24 actions dataset, achieving the state-of-the-art result of the one-stage methods. To the best of our knowledge, our work is the first system that combined 3D CNN and SSD in action detection tasks.

* 26th IEEE International Conference on Image Processing (ICIP 2019)
Click to Read Paper and Get Code
Extracting appropriate features to represent a corpus is an important task for textual mining. Previous attention based work usually enhance feature at the lexical level, which lacks the exploration of feature augmentation at the sentence level. In this paper, we exploit a Dynamic Feature Generation Network (DFGN) to solve this problem. Specifically, DFGN generates features based on a variety of attention mechanisms and attaches features to sentence representation. Then a thresholder is designed to filter the mined features automatically. DFGN extracts the most significant characteristics from datasets to keep its practicability and robustness. Experimental results on multiple well-known answer selection datasets show that our proposed approach significantly outperforms state-of-the-art baselines. We give a detailed analysis of the experiments to illustrate why DFGN provides excellent retrieval and interpretative ability.

Click to Read Paper and Get Code
Attention mechanism has been proven effective on natural language processing. This paper proposes an attention boosted natural language inference model named aESIM by adding word attention and adaptive direction-oriented attention mechanisms to the traditional Bi-LSTM layer of natural language inference models, e.g. ESIM. This makes the inference model aESIM has the ability to effectively learn the representation of words and model the local subsentential inference between pairs of premise and hypothesis. The empirical studies on the SNLI, MultiNLI and Quora benchmarks manifest that aESIM is superior to the original ESIM model.

Click to Read Paper and Get Code
This paper proposes a hierarchical multi-label matcher for patch-based face recognition. In signature generation, a face image is iteratively divided into multi-level patches. Two different types of patch divisions and signatures are introduced for 2D facial image and texture-lifted image, respectively. The matcher training consists of three steps. First, local classifiers are built to learn the local matching of each patch. Second, the hierarchical relationships defined between local patches are used to learn the global matching of each patch. Three ways are introduced to learn the global matching: majority voting, l1-regularized weighting, and decision rule. Last, the global matchings of different levels are combined as the final matching. Experimental results on different face recognition tasks demonstrate the effectiveness of the proposed matcher at the cost of gallery generalization. Compared with the UR2D system, the proposed matcher improves the Rank-1 accuracy significantly by 3% and 0.18% on the UHDB31 dataset and IJB-A dataset, respectively.

* accepted in IVC: Biometrics in the Wild. arXiv admin note: text overlap with arXiv:1803.09359
Click to Read Paper and Get Code
This paper focuses on improving face recognition performance with a new signature combining implicit facial features with explicit soft facial attributes. This signature has two components: the existing patch-based features and the soft facial attributes. A deep convolutional neural network adapted from state-of-the-art networks is used to learn the soft facial attributes. Then, a signature matcher is introduced that merges the contributions of both patch-based features and the facial attributes. In this matcher, the matching scores computed from patch-based features and the facial attributes are combined to obtain a final matching score. The matcher is also extended so that different weights are assigned to different facial attributes. The proposed signature and matcher have been evaluated with the UR2D system on the UHDB31 and IJB-A datasets. The experimental results indicate that the proposed signature achieve better performance than using only patch-based features. The Rank-1 accuracy is improved significantly by 4% and 0.37% on the two datasets when compared with the UR2D system.

Click to Read Paper and Get Code
In this Letter we supervisedly train neural networks to distinguish different topological phases in the context of topological band insulators. After training with Hamiltonians of one-dimensional insulators with chiral symmetry, the neural network can predict their topological winding numbers with nearly 100% accuracy, even for Hamiltonians with larger winding numbers that are not included in the training data. These results show a remarkable success that the neural network can capture the global and nonlinear topological features of quantum phases from local inputs. By opening up the neural network, we confirm that the network does learn the discrete version of the winding number formula. We also make a couple of remarks regarding the role of the symmetry and the opposite effect of regularization techniques when applying machine learning to physical systems.

* Phys. Rev. Lett. 120, 066401 (2018)
* 6 pages, 4 figures and 1 table + 2 pages of supplemental material
Click to Read Paper and Get Code
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin.

* Correspondence to: Guangyong Chen <gycchen@tencent.com>
Click to Read Paper and Get Code
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise.

* Correspondence to: Guangyong Chen <gycchen@tencent.com>
Click to Read Paper and Get Code
Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.

* Accepted by NLPCC 2018. arXiv admin note: text overlap with arXiv:1806.09103
Click to Read Paper and Get Code
In this letter, motivated by the question that whether the empirical fitting of data by neural network can yield the same structure of physical laws, we apply the neural network to a simple quantum mechanical two-body scattering problem with short-range potentials, which by itself also plays an important role in many branches of physics. We train a neural network to accurately predict $ s $-wave scattering length, which governs the low-energy scattering physics, directly from the scattering potential without solving Schr\"odinger equation or obtaining the wavefunction. After analyzing the neural network, it is shown that the neural network develops perturbation theory order by order when the potential increases. This provides an important benchmark to the machine-assisted physics research or even automated machine learning physics laws.

* Phys. Rev. A 98, 010701 (2018)
* 5 pages, 4 figures
Click to Read Paper and Get Code
In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. Recent studies based on LSTM networks have shown great ability to learn social behaviors. However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. To effectively extract the social effect of neighbors, we further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed SR-LSTM and we achieves state-of-the-art results.

* Accepted by CVPR2019
Click to Read Paper and Get Code
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

Click to Read Paper and Get Code
Graph Neural Networks (GNNs) achieve an impressive performance on structured graphs by recursively updating the representation vector of each node based on its neighbors, during which parameterized transformation matrices should be learned for the node feature updating. However, existing propagation schemes are far from being optimal since they do not fully utilize the relational information between nodes. We propose the information maximizing graph neural networks (IGNN), which maximizes the mutual information between edge states and transform parameters. We reformulate the mutual information as a differentiable objective via a variational approach. We compare our model against several recent variants of GNNs and show that our model achieves the state-of-the-art performance on multiple tasks including quantum chemistry regression on QM9 dataset, generalization capability from QM9 to larger molecular graphs, and prediction of molecular bioactivities relevant for drug discovery. The IGNN model is based on an elegant and fundamental idea in information theory as explained in the main text, and it could be easily generalized beyond the contexts of molecular graphs considered in this work. To encourage more future work in this area, all datasets and codes used in this paper will be released for public access.

* 1. Pengfei Chen and Weiwen Liu have equal contribution. 2. Correspondence to: Guangyong Chen <gycchen@tencent.com>
Click to Read Paper and Get Code
Skeleton-based human action recognition has attracted a lot of interests. Recently, there is a trend of using deep feedforward neural networks to model the skeleton sequence which takes the 2D spatio-temporal map derived from the 3D coordinates of joints as input. Some semantics of the joints (frame index and joint type) are implicitly captured and exploited by large receptive fields of deep convolutions at the cost of high complexity. In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. We explicitly introduce the high level semantics of joints as part of the network input to enhance the feature representation capability. The model exploits the global and local information through two semantics-aware graph convolutional layers followed by a convolutional layer. We first leverage the semantics and dynamics (coordinate and velocity) of joints to learn a content adaptive graph for capturing the global spatio-temporal correlations of joints. Then a convolutional layer is used to further enhance the representation power of the features. With an order of magnitude smaller model size and higher speed than some previous works, SGN achieves the state-of-the-art performance on the NTU, SYSU, and N-UCLA datasets. Experimental results demonstrate the effectiveness of explicitly exploiting semantic information in reducing model complexity and improving the recognition accuracy.

Click to Read Paper and Get Code
Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

* COLING 2018, pages 3740-3752
* Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)
Click to Read Paper and Get Code
Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

* Accepted by COLING 2018 demonstration paper
Click to Read Paper and Get Code
In this work we design and train deep neural networks to predict topological invariants for one-dimensional four-band insulators in AIII class whose topological invariant is the winding number, and two-dimensional two-band insulators in A class whose topological invariant is the Chern number. Given Hamiltonians in the momentum space as the input, neural networks can predict topological invariants for both classes with accuracy close to or higher than 90%, even for Hamiltonians whose invariants are beyond the training data set. Despite the complexity of the neural network, we find that the output of certain intermediate hidden layers resembles either the winding angle for models in AIII class or the solid angle (Berry curvature) for models in A class, indicating that neural networks essentially capture the mathematical formula of topological invariants. Our work demonstrates the ability of neural networks to predict topological invariants for complicated models with local Hamiltonians as the only input, and offers an example that even a deep neural network is understandable.

* Phys. Rev. B 98, 085402 (2018)
* 8 pages, 5 figures
Click to Read Paper and Get Code
Many robotic tasks require heavy computation, which can easily exceed the robot's onboard computer capability. A promising solution to address this challenge is outsourcing the computation to the cloud. However, exploiting the potential of cloud resources in robotic software is difficult, because it involves complex code modification and extensive (re)configuration procedures. Moreover, quality of service (QoS) such as timeliness, which is critical to robot's behavior, have to be considered. In this paper, we propose a transparent and QoS-aware software framework called Cloudroid for cloud robotic applications. This framework supports direct deployment of existing robotic software packages to the cloud, transparently transforming them into Internet-accessible cloud services. And with the automatically generated service stubs, robotic applications can outsource their computation to the cloud without any code modification. Furthermore, the robot and the cloud can cooperate to maintain the specific QoS property such as request response time, even in a highly dynamic and resource-competitive environment. We evaluated Cloudroid based on a group of typical robotic scenarios and a set of software packages widely adopted in real-world robot practices. Results show that robot's capability can be enhanced significantly without code modification and specific QoS objectives can be guaranteed. In certain tasks, the "cloud + robot" setup shows improved performance in orders of magnitude compared with the robot native setup.

* Accepted by 10th IEEE International Conference on Cloud Computing in 2017
Click to Read Paper and Get Code
The RGB-D camera maintains a limited range for working and is hard to accurately measure the depth information in a far distance. Besides, the RGB-D camera will easily be influenced by strong lighting and other external factors, which will lead to a poor accuracy on the acquired environmental depth information. Recently, deep learning technologies have achieved great success in the visual SLAM area, which can directly learn high-level features from the visual inputs and improve the estimation accuracy of the depth information. Therefore, deep learning technologies maintain the potential to extend the source of the depth information and improve the performance of the SLAM system. However, the existing deep learning-based methods are mainly supervised and require a large amount of ground-truth depth data, which is hard to acquire because of the realistic constraints. In this paper, we first present an unsupervised learning framework, which not only uses image reconstruction for supervising but also exploits the pose estimation method to enhance the supervised signal and add training constraints for the task of monocular depth and camera motion estimation. Furthermore, we successfully exploit our unsupervised learning framework to assist the traditional ORB-SLAM system when the initialization module of ORB-SLAM method could not match enough features. Qualitative and quantitative experiments have shown that our unsupervised learning framework performs the depth estimation task comparable to the supervised methods and outperforms the previous state-of-the-art approach by $13.5\%$ on KITTI dataset. Besides, our unsupervised learning framework could significantly accelerate the initialization process of ORB-SLAM system and effectively improve the accuracy on environmental mapping in strong lighting and weak texture scenes.

* 27 pages
Click to Read Paper and Get Code