Models, code, and papers for "Pengfei Zhang":
By analyzing Bayesian inference of generative model for random networks with both relations (edges) and node features (discrete labels), we perform an asymptotically exact analysis of the semi-supervised classfication problems on graph-structured data using the cavity method of statistical physics. We unveil detectability phase transitions which put fundamental limit on ability of classfications for all possible algorithms. Our theory naturally converts to a message passing algorithm which works all the way down to the phase transition in the underlying generative model, and can be translated to a graph convolution neural network algorithm which greatly outperforms existing algorithms including popular graph neural networks in synthetic networks. When applied to real-world datasets, our algorithm achieves comparable performance with the state-of-the art algorithms. Our approach provides benchmark datasets with continuously tunable parameters and optimal results, which can be used to evaluate performance of exiting graph neural networks, and to find and understand their strengths and limitations. In particular, we observe that popular GCNs have sparsity issue and ovefitting issue on large synthetic benchmarks, we also show how to overcome the issues by combining strengths of our approach.
We exam various geometric active contour methods for radar image segmentation. Due to special properties of radar images, we propose our new model based on modified Chan-Vese functional. Our method is efficient in separating non-meteorological noises from meteorological images.
We perform theoretical and algorithmic studies for the problem of clustering and semi-supervised classification on graphs with both pairwise relational information and single-point feature information, upon a joint stochastic block model for generating synthetic graphs with both edges and node features. Asymptotically exact analysis based on the Bayesian inference of the underlying model are conducted, using the cavity method in statistical physics. Theoretically, we identify a phase transition of the generative model, which puts fundamental limits on the ability of all possible algorithms in the clustering task of the underlying model. Algorithmically, we propose a belief propagation algorithm that is asymptotically optimal on the generative model, and can be further extended to a belief propagation graph convolution neural network (BPGCN) for semi-supervised classification on graphs. For the first time, well-controlled benchmark datasets with asymptotially exact properties and optimal solutions could be produced for the evaluation of graph convolution neural networks, and for the theoretical understanding of their strengths and weaknesses. In particular, on these synthetic benchmark networks we observe that existing graph convolution neural networks are subject to an sparsity issue and an ovefitting issue in practice, both of which are successfully overcome by our BPGCN. Moreover, when combined with classic neural network methods, BPGCN yields extraordinary classification performances on some real-world datasets that have never been achieved before.
We present a 3D Convolutional Neural Networks (CNNs) based single shot detector for spatial-temporal action detection tasks. Our model includes: (1) two short-term appearance and motion streams, with single RGB and optical flow image input separately, in order to capture the spatial and temporal information for the current frame; (2) two long-term 3D ConvNet based stream, working on sequences of continuous RGB and optical flow images to capture the context from past frames. Our model achieves strong performance for action detection in video and can be easily integrated into any current two-stream action detection methods. We report a frame-mAP of 71.30% on the challenging UCF101-24 actions dataset, achieving the state-of-the-art result of the one-stage methods. To the best of our knowledge, our work is the first system that combined 3D CNN and SSD in action detection tasks.
Extracting appropriate features to represent a corpus is an important task for textual mining. Previous attention based work usually enhance feature at the lexical level, which lacks the exploration of feature augmentation at the sentence level. In this paper, we exploit a Dynamic Feature Generation Network (DFGN) to solve this problem. Specifically, DFGN generates features based on a variety of attention mechanisms and attaches features to sentence representation. Then a thresholder is designed to filter the mined features automatically. DFGN extracts the most significant characteristics from datasets to keep its practicability and robustness. Experimental results on multiple well-known answer selection datasets show that our proposed approach significantly outperforms state-of-the-art baselines. We give a detailed analysis of the experiments to illustrate why DFGN provides excellent retrieval and interpretative ability.
Attention mechanism has been proven effective on natural language processing. This paper proposes an attention boosted natural language inference model named aESIM by adding word attention and adaptive direction-oriented attention mechanisms to the traditional Bi-LSTM layer of natural language inference models, e.g. ESIM. This makes the inference model aESIM has the ability to effectively learn the representation of words and model the local subsentential inference between pairs of premise and hypothesis. The empirical studies on the SNLI, MultiNLI and Quora benchmarks manifest that aESIM is superior to the original ESIM model.
This paper proposes a hierarchical multi-label matcher for patch-based face recognition. In signature generation, a face image is iteratively divided into multi-level patches. Two different types of patch divisions and signatures are introduced for 2D facial image and texture-lifted image, respectively. The matcher training consists of three steps. First, local classifiers are built to learn the local matching of each patch. Second, the hierarchical relationships defined between local patches are used to learn the global matching of each patch. Three ways are introduced to learn the global matching: majority voting, l1-regularized weighting, and decision rule. Last, the global matchings of different levels are combined as the final matching. Experimental results on different face recognition tasks demonstrate the effectiveness of the proposed matcher at the cost of gallery generalization. Compared with the UR2D system, the proposed matcher improves the Rank-1 accuracy significantly by 3% and 0.18% on the UHDB31 dataset and IJB-A dataset, respectively.
This paper focuses on improving face recognition performance with a new signature combining implicit facial features with explicit soft facial attributes. This signature has two components: the existing patch-based features and the soft facial attributes. A deep convolutional neural network adapted from state-of-the-art networks is used to learn the soft facial attributes. Then, a signature matcher is introduced that merges the contributions of both patch-based features and the facial attributes. In this matcher, the matching scores computed from patch-based features and the facial attributes are combined to obtain a final matching score. The matcher is also extended so that different weights are assigned to different facial attributes. The proposed signature and matcher have been evaluated with the UR2D system on the UHDB31 and IJB-A datasets. The experimental results indicate that the proposed signature achieve better performance than using only patch-based features. The Rank-1 accuracy is improved significantly by 4% and 0.37% on the two datasets when compared with the UR2D system.
In this Letter we supervisedly train neural networks to distinguish different topological phases in the context of topological band insulators. After training with Hamiltonians of one-dimensional insulators with chiral symmetry, the neural network can predict their topological winding numbers with nearly 100% accuracy, even for Hamiltonians with larger winding numbers that are not included in the training data. These results show a remarkable success that the neural network can capture the global and nonlinear topological features of quantum phases from local inputs. By opening up the neural network, we confirm that the network does learn the discrete version of the winding number formula. We also make a couple of remarks regarding the role of the symmetry and the opposite effect of regularization techniques when applying machine learning to physical systems.
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin.
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) as DNNs usually have the high capacity to memorize the noisy labels. In this paper, we find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. In particular, the test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. Based on our analysis, we apply cross-validation to randomly split noisy datasets, which identifies most samples that have correct labels. Then we adopt the Co-teaching strategy which takes full advantage of the identified samples to train DNNs robustly against noisy labels. Compared with extensive state-of-the-art methods, our strategy consistently improves the generalization performance of DNNs under both synthetic and real-world training noise.
While neural network-based models have achieved impressive performance on a large body of NLP tasks, the generalization behavior of different models remains poorly understood: Does this excellent performance imply a perfect generalization model, or are there still some limitations? In this paper, we take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives and characterize the differences of their generalization abilities through the lens of our proposed measures, which guides us to better design models and training methods. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement. We have released the datasets: (ReCoNLL, PLONER) for the future research at our project page: http://pfliu.com/InterpretNER/. As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers and classifies them into different research topics: https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers.
Quantum neural networks are one of the promising applications for near-term noisy intermediate-scale quantum computers. A quantum neural network distills the information from the input wavefunction into the output qubits. In this Letter, we show that this process can also be viewed from the opposite direction: the quantum information in the output qubits is scrambled into the input. This observation motivates us to use the tripartite information, a quantity recently developed to characterize information scrambling, to diagnose the training dynamics of quantum neural networks. We empirically find strong correlation between the dynamical behavior of the tripartite information and the loss function in the training process, from which we identify that the training process has two stages for randomly initialized networks. In the early stage, the network performance improves rapidly and the tripartite information increases linearly with a universal slope, meaning that the neural network becomes less scrambled than the random unitary. In the latter stage, the network performance improves slowly while the tripartite information decreases. We present evidences that the network constructs local correlations in the early stage and learns large-scale structures in the latter stage. We believe this two-stage training dynamics is universal and is applicable to a wide range of problems. Our work builds bridges between two research subjects of quantum neural networks and information scrambling, which opens up a new perspective to understand quantum neural networks.
Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown suboptimal for the concerned task. In this paper, we empirically explore different integration strategies of word and character embeddings and propose a character-augmented reader which attends character-level representation to augment word embedding with a short list to improve word representations, especially for rare words. Experimental results show that the proposed approach helps the baseline model significantly outperform state-of-the-art baselines on various public benchmarks.
In this letter, motivated by the question that whether the empirical fitting of data by neural network can yield the same structure of physical laws, we apply the neural network to a simple quantum mechanical two-body scattering problem with short-range potentials, which by itself also plays an important role in many branches of physics. We train a neural network to accurately predict $ s $-wave scattering length, which governs the low-energy scattering physics, directly from the scattering potential without solving Schr\"odinger equation or obtaining the wavefunction. After analyzing the neural network, it is shown that the neural network develops perturbation theory order by order when the potential increases. This provides an important benchmark to the machine-assisted physics research or even automated machine learning physics laws.
In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. Recent studies based on LSTM networks have shown great ability to learn social behaviors. However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. To effectively extract the social effect of neighbors, we further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed SR-LSTM and we achieves state-of-the-art results.
Colorectal cancer (CRC) is one of the most commonly diagnosed cancers and a leading cause of cancer deaths in the United States. Colorectal polyps that grow on the intima of the colon or rectum is an important precursor for CRC. Currently, the most common way for colorectal polyp detection and precancerous pathology is the colonoscopy. Therefore, accurate colorectal polyp segmentation during the colonoscopy procedure has great clinical significance in CRC early detection and prevention. In this paper, we propose a novel end-to-end deep learning framework for the colorectal polyp segmentation. The model we design consists of an encoder to extract multi-scale semantic features and a decoder to expand the feature maps to a polyp segmentation map. We improve the feature representation ability of the encoder by introducing the dilated convolution to learn high-level semantic features without resolution reduction. We further design a simplified decoder which combines multi-scale semantic features with fewer parameters than the traditional architecture. Furthermore, we apply three post processing techniques on the output segmentation map to improve colorectal polyp detection performance. Our method achieves state-of-the-art results on CVC-ClinicDB and ETIS-Larib Polyp DB.
In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
Graph Neural Networks (GNNs) achieve an impressive performance on structured graphs by recursively updating the representation vector of each node based on its neighbors, during which parameterized transformation matrices should be learned for the node feature updating. However, existing propagation schemes are far from being optimal since they do not fully utilize the relational information between nodes. We propose the information maximizing graph neural networks (IGNN), which maximizes the mutual information between edge states and transform parameters. We reformulate the mutual information as a differentiable objective via a variational approach. We compare our model against several recent variants of GNNs and show that our model achieves the state-of-the-art performance on multiple tasks including quantum chemistry regression on QM9 dataset, generalization capability from QM9 to larger molecular graphs, and prediction of molecular bioactivities relevant for drug discovery. The IGNN model is based on an elegant and fundamental idea in information theory as explained in the main text, and it could be easily generalized beyond the contexts of molecular graphs considered in this work. To encourage more future work in this area, all datasets and codes used in this paper will be released for public access.