Models, code, and papers for "Ji He":

Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is not well-studied. After discussing the usage difference and examining the existing convolution algorithms, we proposed the HNTMP convolution algorithm. The HNTMP convolution algorithm achieves $14.6 \times$ speedup than the most popular \textit{im2col} convolution algorithm, and $2.30 \times$ speedup than the fastest existing convolution algorithm (direct convolution) as far as we know.

Convolution neural networks are widely used for mobile applications. However, GPU convolution algorithms are designed for mini-batch neural network training, the single-image convolution neural network inference algorithm on mobile GPUs is not well-studied. After discussing the usage difference and examining the existing convolution algorithms, we proposed the HNTMP convolution algorithm. The HNTMP convolution algorithm achieves $14.6 \times$ speedup than the most popular \textit{im2col} convolution algorithm, and $2.1 \times$ speedup than the fastest existing convolution algorithm (direct convolution) as far as we know.

Finding relative pose between two calibrated views is a fundamental task in computer vision. Given the minimal number $5$ of required point correspondences, the classical five-point method can be used to calculate the essential matrix. For the non-minimal cases when $N$ ($N > 5$) correct point correspondences are given, which is called $N$-point problem, methods are relatively less mature. In this paper, we solve the $N$-point problem by minimizing the algebraic error and formulate it as a quadratically constrained quadratic program (QCQP). The formulation is based on a simpler parameterization of the feasible region -- the normalized essential matrix manifold -- than previous approaches. Then a globally optimal solution to this problem is obtained by semidefinite relaxation. This allows us to obtain certifiably global solutions to an important non-convex problem in polynomial time. We provide the condition to recover the optimal essential matrix from the relaxed problems. The theoretical guarantees of the semidefinite relaxation are investigated, including the tightness and local stability. Experiments demonstrate that our approach always finds and certifies (a-posteriori) the global optimum of the cost function, and it is dozens of times faster than state-of-the-art globally optimal solutions.

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep neural network inference engine named HG-Caffe, which supports GPUs with half precision. HG-Caffe provides up to 20 times speedup with GPUs compared to the original implementations. In addition to the speedup, the peak memory usage is also reduced to about 80%. With HG-Caffe, more innovative and fascinating mobile applications will be turned into reality.

Relation classification is an important semantic processing task in the field of natural language processing (NLP). In this paper, we present a novel model, Structure Regularized Bidirectional Recurrent Convolutional Neural Network(SR-BRCNN), to classify the relation of two entities in a sentence, and the new dataset of Chinese Sanwen for named entity recognition and relation classification. Some state-of-the-art systems concentrate on modeling the shortest dependency path (SDP) between two entities leveraging convolutional or recurrent neural networks. We further explore how to make full use of the dependency relations information in the SDP and how to improve the model by the method of structure regularization. We propose a structure regularized model to learn relation representations along the SDP extracted from the forest formed by the structure regularized dependency tree, which benefits reducing the complexity of the whole model and helps improve the $F_{1}$ score by 10.3. Experimental results show that our method outperforms the state-of-the-art approaches on the Chinese Sanwen task and performs as well on the SemEval-2010 Task 8 dataset\footnote{The Chinese Sanwen corpus this paper developed and used will be released in the further.

With the further development of informatization, more and more data is stored in the form of text. There are some loss of text during their generation and transmission. The paper aims to establish a language model based on the large-scale corpus to complete the restoration of missing text. In this paper, we introduce a novel measurement to find the missing words, and a way of establishing a comprehensive candidate lexicon to insert the correct choice of words. The paper also introduces some effective optimization methods, which largely improve the efficiency of the text restoration and shorten the time of dealing with 1000 sentences into 3.6 seconds. \keywords{ language model, sentence correction, word imputation, parallel optimization

This report presents the results and details of a content-based image retrieval project using the Top-surf descriptor. The experimental results are preliminary, however, it shows the capability of deducing objects from parts of the objects or from the objects that are similar. This paper uses a dataset consisting of 1200 images of which 800 images are equally divided into 8 categories, namely airplane, beach, motorbike, forest, elephants, horses, bus and building, while the other 400 images are randomly picked from the Internet. The best results achieved are from building category.

In game theory and artificial intelligence, decision making models often involve maximizing expected utility, which does not respect ordinal invariance. In this paper, the author discusses the possibility of preserving ordinal invariance and still making a rational decision under uncertainty.

The most important notations of Communicating Sequential Process(CSP) are the process and the prefix (event)$\rightarrow$(process) operator. While we can formally apply the $\rightarrow$ operator to define a live process's behavior, the STOP process, which usually resulted from deadlock, starving or livelock, is lack of formal description, defined by most literatures as "doing nothing but halt". In this paper, we argue that the STOP process should not be considered as a black box, it should follow the prefix $\rightarrow$ schema and the same inference rules so that a unified and consistent process algebra model can be established. In order to achieve this goal, we introduce a special event called "nil" that any process can take. This nil event will do nothing meaningful and leave nothing on a process's observable record. With the nil event and its well-defined rules, we can successfully use the $\rightarrow$ operator to formally describe a process's complete behavior in its whole life circle. More interestingly, we can use prefix $\rightarrow$ and nil event to fully describe the STOP process's internal behavior and conclude that the STOP's formal equation can be given as simple as STOP$_{\alpha X} = \mu$ X. nil $\rightarrow$ X.

It is important for machines to interpret human emotions properly for better human-machine communications, as emotion is an essential part of human-to-human communications. One aspect of emotion is reflected in the language we use. How to represent emotions in texts is a challenge in natural language processing (NLP). Although continuous vector representations like word2vec have become the new norm for NLP problems, their limitations are that they do not take emotions into consideration and can unintentionally contain bias toward certain identities like different genders. This thesis focuses on improving existing representations in both word and sentence levels by explicitly taking emotions inside text and model bias into account in their training process. Our improved representations can help to build more robust machine learning models for affect-related text classification like sentiment/emotion analysis and abusive language detection. We first propose representations called emotional word vectors (EVEC), which is learned from a convolutional neural network model with an emotion-labeled corpus, which is constructed using hashtags. Secondly, we extend to learning sentence-level representations with a huge corpus of texts with the pseudo task of recognizing emojis. Our results show that, with the representations trained from millions of tweets with weakly supervised labels such as hashtags and emojis, we can solve sentiment/emotion analysis tasks more effectively. Lastly, as examples of model bias in representations of existing approaches, we explore a specific problem of automatic detection of abusive language. We address the issue of gender bias in various neural network models by conducting experiments to measure and reduce those biases in the representations in order to build more robust classification models.

Radon transform is widely used in physical and life sciences and one of its major applications is the X-ray computed tomography (X-ray CT), which is significant in modern health examination. The Radon inversion or image reconstruction is challenging due to the potentially defective radon projections. Conventionally, the reconstruction process contains several ad hoc stages to approximate the corresponding Radon inversion. Each of the stages is highly dependent on the results of the previous stage. In this paper, we propose a novel unified framework for Radon inversion via deep learning (DL). The Radon inversion can be approximated by the proposed framework with an end-to-end fashion instead of processing step-by-step with multiple stages. For simplicity, the proposed framework is short as iRadonMap (inverse Radon transform approximation). Specifically, we implement the iRadonMap as an appropriative neural network, of which the architecture can be divided into two segments. In the first segment, a learnable fully-connected filtering layer is used to filter the radon projections along the view-angle direction, which is followed by a learnable sinusoidal back-projection layer to transfer the filtered radon projections into an image. The second segment is a common neural network architecture to further improve the reconstruction performance in the image domain. The iRadonMap is overall optimized by training a large number of generic images from ImageNet database. To evaluate the performance of the iRadonMap, clinical patient data is used. Qualitative results show promising reconstruction performance of the iRadonMap.

In order to cluster or partition data, we often use Expectation-and-Maximization (EM) or Variational approximation with a Gaussian Mixture Model (GMM), which is a parametric probability density function represented as a weighted sum of $\hat{K}$ Gaussian component densities. However, model selection to find underlying $\hat{K}$ is one of the key concerns in GMM clustering, since we can obtain the desired clusters only when $\hat{K}$ is known. In this paper, we propose a new model selection algorithm to explore $\hat{K}$ in a Bayesian framework. The proposed algorithm builds the density of the model order which any information criterions such as AIC and BIC basically fail to reconstruct. In addition, this algorithm reconstructs the density quickly as compared to the time-consuming Monte Carlo simulation.

Single molecule fluorescence microscopy is a powerful technique for uncovering detailed information about biological systems, both in vitro and in vivo. In such experiments, the inherently low signal to noise ratios mean that accurate algorithms to separate true signal and background noise are essential to generate meaningful results. To this end, we have developed a new and robust method to reduce noise in single molecule fluorescence images by using a Gaussian Markov Random Field (GMRF) prior in a Bayesian framework. Two different strategies are proposed to build the prior - an intrinsic GMRF, with a stationary relationship between pixels and a heterogeneous intrinsic GMRF, with a differently weighted relationship between pixels classified as molecules and background. Testing with synthetic and real experimental fluorescence images demonstrates that the heterogeneous intrinsic GMRF is superior to other conventional de-noising approaches.

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate \textbf{multiplicative} perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their \textbf{additive} counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively while the standard AT and VAT only support a transductive implementation. We conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30\% faster than their additive counterparts. Furthermore, the resulting DNNs also demonstrate distinct weight distributions.

Stochastic variational inference (SVI) plays a key role in Bayesian deep learning. Recently various divergences have been proposed to design the surrogate loss for variational inference. We present a simple upper bound of the evidence as the surrogate loss. This evidence upper bound (EUBO) equals to the log marginal likelihood plus the KL-divergence between the posterior and the proposal. We show that the proposed EUBO is tighter than previous upper bounds introduced by $\chi$-divergence or $\alpha$-divergence. To facilitate scalable inference, we present the numerical approximation of the gradient of the EUBO and apply the SGD algorithm to optimize the variational parameters iteratively. Simulation study with Bayesian logistic regression shows that the upper and lower bounds well sandwich the evidence and the proposed upper bound is favorably tight. For Bayesian neural network, the proposed EUBO-VI algorithm outperforms state-of-the-art results for various examples.

Graph Neural Networks (GNNs) have proved to be an effective representation learning framework for graph-structured data, and have achieved state-of-the-art performance on all sorts of practical tasks, such as node classification, link prediction and graph classification. Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors of a node for feature aggregation, and improve the performance of many graph learning tasks. However, real-world graphs are often very large and noisy, and GATs are plagued to overfitting if not regularized properly. In this paper, we propose Sparse Graph Attention Networks (SGATs) that learn sparse attention coefficients under an $L_0$-norm regularization, and the learned sparse attentions are then used for all GNN layers, resulting in an edge-sparsified graph. By doing so, we can identify noisy / insignificant edges, and thus focus computation on more important portion of a graph. Extensive experiments on synthetic and real-world graph learning benchmarks demonstrate the superior performance of SGATs. In particular, SGATs can remove about 50\%-80\% edges from large graphs, such as PPI and Reddit, while retaining similar classification accuracies. Furthermore, the removed edges can be interpreted intuitively and quantitatively. To the best of our knowledge, this is the first graph learning algorithm that sparsifies graphs for the purpose of identifying important relationship between nodes and for robust training.

Sentiment analysis has been widely used by businesses for social media opinion mining, especially in the financial services industry, where customers' feedbacks are critical for companies. Recent progress of neural network models has achieved remarkable performance on sentiment classification, while the lack of classification interpretation may raise the trustworthy and many other issues in practice. In this work, we study the problem of improving the explainability of existing sentiment classifiers. We propose two data augmentation methods that create additional training examples to help improve model explainability: one method with a predefined sentiment word list as external knowledge and the other with adversarial examples. We test the proposed methods on both CNN and RNN classifiers with three benchmark sentiment datasets. The model explainability is assessed by both human evaluators and a simple automatic evaluation measurement. Experiments show the proposed data augmentation methods significantly improve the explainability of both neural classifiers.

Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size $n$, the (inverse) target error $1/\epsilon$, and the (inverse) failure probability $1/\delta$. This work shows that $\widetilde{O}(1/\epsilon)$ iterations of gradient descent with $\widetilde{\Omega}(1/\epsilon^2)$ training examples on two-layer ReLU networks of any width exceeding $\mathrm{polylog}(n,1/\epsilon,1/\delta)$ suffice to achieve a test misclassification error of $\epsilon$. The analysis further relies upon a margin property of the limiting kernel, which is guaranteed positive, and can distinguish between true labels and random labels.

Recent progress of neural network models has achieved remarkable performance on sentiment classification, while the lack of classification interpretation may raise the trustworthy and many other issues in practice. In this work, we study the problem of improving the interpretability of existing sentiment classifiers. We propose two data augmentation methods that create additional training examples to help improve model interpretability: one method with a predefined sentiment word list as external knowledge and the other with adversarial examples. We test the proposed methods on both CNN and RNN classifiers with three benchmark sentiment datasets. The model interpretability is assessed by both human evaluators and a simple automatic evaluation measurement. Experiments show the proposed data augmentation methods significantly improve the interpretability of both neural classifiers.