Models, code, and papers for "Pengfei Zhang":
By analyzing Bayesian inference of generative model for random networks with both relations (edges) and node features (discrete labels), we perform an asymptotically exact analysis of the semi-supervised classfication problems on graph-structured data using the cavity method of statistical physics. We unveil detectability phase transitions which put fundamental limit on ability of classfications for all possible algorithms. Our theory naturally converts to a message passing algorithm which works all the way down to the phase transition in the underlying generative model, and can be translated to a graph convolution neural network algorithm which greatly outperforms existing algorithms including popular graph neural networks in synthetic networks. When applied to real-world datasets, our algorithm achieves comparable performance with the state-of-the art algorithms. Our approach provides benchmark datasets with continuously tunable parameters and optimal results, which can be used to evaluate performance of exiting graph neural networks, and to find and understand their strengths and limitations. In particular, we observe that popular GCNs have sparsity issue and ovefitting issue on large synthetic benchmarks, we also show how to overcome the issues by combining strengths of our approach.
We exam various geometric active contour methods for radar image segmentation. Due to special properties of radar images, we propose our new model based on modified Chan-Vese functional. Our method is efficient in separating non-meteorological noises from meteorological images.
We present a 3D Convolutional Neural Networks (CNNs) based single shot detector for spatial-temporal action detection tasks. Our model includes: (1) two short-term appearance and motion streams, with single RGB and optical flow image input separately, in order to capture the spatial and temporal information for the current frame; (2) two long-term 3D ConvNet based stream, working on sequences of continuous RGB and optical flow images to capture the context from past frames. Our model achieves strong performance for action detection in video and can be easily integrated into any current two-stream action detection methods. We report a frame-mAP of 71.30% on the challenging UCF101-24 actions dataset, achieving the state-of-the-art result of the one-stage methods. To the best of our knowledge, our work is the first system that combined 3D CNN and SSD in action detection tasks.
Extracting appropriate features to represent a corpus is an important task for textual mining. Previous attention based work usually enhance feature at the lexical level, which lacks the exploration of feature augmentation at the sentence level. In this paper, we exploit a Dynamic Feature Generation Network (DFGN) to solve this problem. Specifically, DFGN generates features based on a variety of attention mechanisms and attaches features to sentence representation. Then a thresholder is designed to filter the mined features automatically. DFGN extracts the most significant characteristics from datasets to keep its practicability and robustness. Experimental results on multiple well-known answer selection datasets show that our proposed approach significantly outperforms state-of-the-art baselines. We give a detailed analysis of the experiments to illustrate why DFGN provides excellent retrieval and interpretative ability.
Attention mechanism has been proven effective on natural language processing. This paper proposes an attention boosted natural language inference model named aESIM by adding word attention and adaptive direction-oriented attention mechanisms to the traditional Bi-LSTM layer of natural language inference models, e.g. ESIM. This makes the inference model aESIM has the ability to effectively learn the representation of words and model the local subsentential inference between pairs of premise and hypothesis. The empirical studies on the SNLI, MultiNLI and Quora benchmarks manifest that aESIM is superior to the original ESIM model.
This paper proposes a hierarchical multi-label matcher for patch-based face recognition. In signature generation, a face image is iteratively divided into multi-level patches. Two different types of patch divisions and signatures are introduced for 2D facial image and texture-lifted image, respectively. The matcher training consists of three steps. First, local classifiers are built to learn the local matching of each patch. Second, the hierarchical relationships defined between local patches are used to learn the global matching of each patch. Three ways are introduced to learn the global matching: majority voting, l1-regularized weighting, and decision rule. Last, the global matchings of different levels are combined as the final matching. Experimental results on different face recognition tasks demonstrate the effectiveness of the proposed matcher at the cost of gallery generalization. Compared with the UR2D system, the proposed matcher improves the Rank-1 accuracy significantly by 3% and 0.18% on the UHDB31 dataset and IJB-A dataset, respectively.
This paper focuses on improving face recognition performance with a new signature combining implicit facial features with explicit soft facial attributes. This signature has two components: the existing patch-based features and the soft facial attributes. A deep convolutional neural network adapted from state-of-the-art networks is used to learn the soft facial attributes. Then, a signature matcher is introduced that merges the contributions of both patch-based features and the facial attributes. In this matcher, the matching scores computed from patch-based features and the facial attributes are combined to obtain a final matching score. The matcher is also extended so that different weights are assigned to different facial attributes. The proposed signature and matcher have been evaluated with the UR2D system on the UHDB31 and IJB-A datasets. The experimental results indicate that the proposed signature achieve better performance than using only patch-based features. The Rank-1 accuracy is improved significantly by 4% and 0.37% on the two datasets when compared with the UR2D system.
In this Letter we supervisedly train neural networks to distinguish different topological phases in the context of topological band insulators. After training with Hamiltonians of one-dimensional insulators with chiral symmetry, the neural network can predict their topological winding numbers with nearly 100% accuracy, even for Hamiltonians with larger winding numbers that are not included in the training data. These results show a remarkable success that the neural network can capture the global and nonlinear topological features of quantum phases from local inputs. By opening up the neural network, we confirm that the network does learn the discrete version of the winding number formula. We also make a couple of remarks regarding the role of the symmetry and the opposite effect of regularization techniques when applying machine learning to physical systems.
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin.
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise.
Quantum neural networks are one of the promising applications for near-term noisy intermediate-scale quantum computers. A quantum neural network distills the information from the input wavefunction into the output qubits. In this Letter, we show that this process can also be viewed from the opposite direction: the quantum information in the output qubits is scrambled into the input. This observation motivates us to use the tripartite information, a quantity recently developed to characterize information scrambling, to diagnose the training dynamics of quantum neural networks. We empirically find strong correlation between the dynamical behavior of the tripartite information and the loss function in the training process, from which we identify that the training process has two stages for randomly initialized networks. In the early stage, the network performance improves rapidly and the tripartite information increases linearly with a universal slope, meaning that the neural network becomes less scrambled than the random unitary. In the latter stage, the network performance improves slowly while the tripartite information decreases. We present evidences that the network constructs local correlations in the early stage and learns large-scale structures in the latter stage. We believe this two-stage training dynamics is universal and is applicable to a wide range of problems. Our work builds bridges between two research subjects of quantum neural networks and information scrambling, which opens up a new perspective to understand quantum neural networks.
Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.
In this letter, motivated by the question that whether the empirical fitting of data by neural network can yield the same structure of physical laws, we apply the neural network to a simple quantum mechanical two-body scattering problem with short-range potentials, which by itself also plays an important role in many branches of physics. We train a neural network to accurately predict $ s $-wave scattering length, which governs the low-energy scattering physics, directly from the scattering potential without solving Schr\"odinger equation or obtaining the wavefunction. After analyzing the neural network, it is shown that the neural network develops perturbation theory order by order when the potential increases. This provides an important benchmark to the machine-assisted physics research or even automated machine learning physics laws.
In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. Recent studies based on LSTM networks have shown great ability to learn social behaviors. However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. To effectively extract the social effect of neighbors, we further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed SR-LSTM and we achieves state-of-the-art results.
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
Graph Neural Networks (GNNs) achieve an impressive performance on structured graphs by recursively updating the representation vector of each node based on its neighbors, during which parameterized transformation matrices should be learned for the node feature updating. However, existing propagation schemes are far from being optimal since they do not fully utilize the relational information between nodes. We propose the information maximizing graph neural networks (IGNN), which maximizes the mutual information between edge states and transform parameters. We reformulate the mutual information as a differentiable objective via a variational approach. We compare our model against several recent variants of GNNs and show that our model achieves the state-of-the-art performance on multiple tasks including quantum chemistry regression on QM9 dataset, generalization capability from QM9 to larger molecular graphs, and prediction of molecular bioactivities relevant for drug discovery. The IGNN model is based on an elegant and fundamental idea in information theory as explained in the main text, and it could be easily generalized beyond the contexts of molecular graphs considered in this work. To encourage more future work in this area, all datasets and codes used in this paper will be released for public access.
Skeleton-based human action recognition has attracted a lot of interests. Recently, there is a trend of using deep feedforward neural networks to model the skeleton sequence which takes the 2D spatio-temporal map derived from the 3D coordinates of joints as input. Some semantics of the joints (frame index and joint type) are implicitly captured and exploited by large receptive fields of deep convolutions at the cost of high complexity. In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. We explicitly introduce the high level semantics of joints as part of the network input to enhance the feature representation capability. The model exploits the global and local information through two semantics-aware graph convolutional layers followed by a convolutional layer. We first leverage the semantics and dynamics (coordinate and velocity) of joints to learn a content adaptive graph for capturing the global spatio-temporal correlations of joints. Then a convolutional layer is used to further enhance the representation power of the features. With an order of magnitude smaller model size and higher speed than some previous works, SGN achieves the state-of-the-art performance on the NTU, SYSU, and N-UCLA datasets. Experimental results demonstrate the effectiveness of explicitly exploiting semantic information in reducing model complexity and improving the recognition accuracy.
Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.
Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.
In this work we design and train deep neural networks to predict topological invariants for one-dimensional four-band insulators in AIII class whose topological invariant is the winding number, and two-dimensional two-band insulators in A class whose topological invariant is the Chern number. Given Hamiltonians in the momentum space as the input, neural networks can predict topological invariants for both classes with accuracy close to or higher than 90%, even for Hamiltonians whose invariants are beyond the training data set. Despite the complexity of the neural network, we find that the output of certain intermediate hidden layers resembles either the winding angle for models in AIII class or the solid angle (Berry curvature) for models in A class, indicating that neural networks essentially capture the mathematical formula of topological invariants. Our work demonstrates the ability of neural networks to predict topological invariants for complicated models with local Hamiltonians as the only input, and offers an example that even a deep neural network is understandable.