Models, code, and papers for "Jiaxin Ding":

IPO: Interior-point Policy Optimization under Constraints

Oct 21, 2019
Yongshuai Liu, Jiaxin Ding, Xin Liu

In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multiconstraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

  Click for Model/Code and Paper
PBGen: Partial Binarization of Deconvolution-Based Generators for Edge Intelligence

Mar 21, 2018
Jinglan Liu, Jiaxin Zhang, Yukun Ding, Xiaowei Xu, Meng Jiang, Yiyu Shi

This work explores the binarization of the deconvolution-based generator in a GAN for memory saving and speedup of image construction. Our study suggests that different from convolutional neural networks (including the discriminator) where all layers can be binarized, only some of the layers in the generator can be binarized without significant performance loss. Supported by theoretical analysis and verified by experiments, a direct metric based on the dimension of deconvolution operations is established, which can be used to quickly decide which layers in the generator can be binarized. Our results also indicate that both the generator and the discriminator should be binarized simultaneously for balanced competition and better performance. Experimental results based on CelebA suggest that directly applying state-of-the-art binarization techniques to all the layers of the generator will lead to 2.83$\times$ performance loss measured by sliced Wasserstein distance compared with the original generator, while applying them to selected layers only can yield up to 25.81$\times$ saving in memory consumption, and 1.96$\times$ and 1.32$\times$ speedup in inference and training respectively with little performance loss.

* 17 pages, paper re-organized; 

  Click for Model/Code and Paper
Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation

Oct 24, 2019
Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann

The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filters. To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs). Extensive experiments confirm that the performance gap between the 1-bit and full-precision DCNNs is minimized by increasing the filter diversity, which further increases the representational ability in our networks. Our experiments on ImageNet show that CBCNs achieve 61.4% top-1 accuracy with ResNet18. Compared to the state-of-the-art such as XNOR, CBCNs can achieve up to 10% higher top-1 accuracy with more powerful representational ability.

* ]Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699 
* Published in CVPR2019 

  Click for Model/Code and Paper
RGB Video Based Tennis Action Recognition Using a Deep Historical Long Short-Term Memory

Sep 25, 2018
Jiaxin Cai, Xin Tang

Action recognition has attracted increasing attention from RGB input in computer vision partially due to potential applications on somatic simulation and statistics of sport such as virtual tennis game and tennis techniques and tactics analysis by video. Recently, deep learning based methods have achieved promising performance for action recognition. In this paper, we propose weighted Long Short-Term Memory adopted with convolutional neural network representations for three dimensional tennis shots recognition. First, the local two-dimensional convolutional neural network spatial representations are extracted from each video frame individually using a pre-trained Inception network. Then, a weighted Long Short-Term Memory decoder is introduced to take the output state at time t and the historical embedding feature at time t-1 to generate feature vector using a score weighting scheme. Finally, we use the adopted CNN and weighted LSTM to map the original visual features into a vector space to generate the spatial-temporal semantical description of visual sequences and classify the action video content. Experiments on the benchmark demonstrate that our method using only simple raw RGB video can achieve better performance than the state-of-the-art baselines for tennis shot recognition.

  Click for Model/Code and Paper
Building Memory with Concept Learning Capabilities from Large-scale Knowledge Base

Dec 03, 2015
Jiaxin Shi, Jun Zhu

We present a new perspective on neural knowledge base (KB) embeddings, from which we build a framework that can model symbolic knowledge in the KB together with its learning process. We show that this framework well regularizes previous neural KB embedding model for superior performance in reasoning tasks, while having the capabilities of dealing with unseen entities, that is, to learn their embeddings from natural language descriptions, which is very like human's behavior of learning semantic concepts.

* Accepted to NIPS 2015 Cognitive Computation workshop (CoCo@NIPS 2015) 

  Click for Model/Code and Paper
Generalized tensor regression with covariates on multiple modes

Oct 21, 2019
Zhuoyan Xu, Jiaxin Hu, Miaoyan Wang

We consider the problem of tensor-response regression given covariates on multiple modes. Such data problems arise frequently in applications such as neuroimaging, network analysis, and spatial-temporal modeling. We propose a new family of tensor response regression models that incorporate covariates, and establish the theoretical accuracy guarantees. Unlike earlier methods, our estimation allows high-dimensionality in both the tensor response and the covariate matrices on multiple modes. An efficient alternating updating algorithm is further developed. Our proposal handles a broad range of data types, including continuous, count, and binary observations. Through simulation and applications to two real datasets, we demonstrate the outperformance of our approach over the state-of-art.

* 25 pages, 6 figures 

  Click for Model/Code and Paper
Targeted Sentiment Analysis: A Data-Driven Categorization

May 09, 2019
Jiaxin Pei, Aixin Sun, Chenliang Li

Targeted sentiment analysis (TSA), also known as aspect based sentiment analysis (ABSA), aims at detecting fine-grained sentiment polarity towards targets in a given opinion document. Due to the lack of labeled datasets and effective technology, TSA had been intractable for many years. The newly released datasets and the rapid development of deep learning technologies are key enablers for the recent significant progress made in this area. However, the TSA tasks have been defined in various ways with different understandings towards basic concepts like `target' and `aspect'. In this paper, we categorize the different tasks and highlight the differences in the available datasets and their specific tasks. We then further discuss the challenges related to data collection and data annotation which are overlooked in many previous studies.

* Draft 

  Click for Model/Code and Paper
A Scalable Evolution Strategy with Directional Gaussian Smoothing for Blackbox Optimization

Feb 07, 2020
Jiaxin Zhang, Hoang Tran, Dan Lu, Guannan Zhang

We developed a new scalable evolution strategy with directional Gaussian smoothing (DGS-ES) for high-dimensional blackbox optimization. Standard ES methods have been proved to suffer from the curse of dimensionality, due to the random directional search and low accuracy of Monte Carlo estimation. The key idea of this work is to develop Gaussian smoothing approach which only averages the original objective function along $d$ orthogonal directions. In this way, the partial derivatives of the smoothed function along those directions can be represented by one-dimensional integrals, instead of $d$-dimensional integrals in the standard ES methods. As such, the averaged partial derivatives can be approximated using the Gauss-Hermite quadrature rule, as opposed to MC, which significantly improves the accuracy of the averaged gradients. Moreover, the smoothing technique reduces the barrier of local minima, such that global minima become easier to achieve. We provide three sets of examples to demonstrate the performance of our method, including benchmark functions for global optimization, and a rocket shell design problem.

  Click for Model/Code and Paper
Improving Interpretability of Word Embeddings by Generating Definition and Usage

Dec 12, 2019
Haitong Zhang, Yongping Du, Jiaxin Sun, Qingxiao Li

Word Embeddings, which encode semantic and syntactic features, have achieved success in many natural language processing tasks recently. However, the lexical semantics captured by these embeddings are difficult to interpret due to the dense vector representations. In order to improve the interpretability of word vectors, we explore definition modeling task and propose a novel framework (Semantics-Generator) to generate more reasonable and understandable context-dependent definitions. Moreover, we introduce usage modeling and study whether it is possible to utilize distributed representations to generate example sentences of words. These ways of semantics generation are a more direct and explicit expression of embedding's semantics. Two multi-task learning methods are used to combine usage modeling and definition modeling. To verify our approach, we construct Oxford-2019 dataset, where each entry contains word, context, example sentence and corresponding definition. Experimental results show that Semantics-Generator achieves the state-of-the-art result in definition modeling and the multi-task learning methods are helpful for two tasks to improve the performance.

  Click for Model/Code and Paper
Two Causal Principles for Improving Visual Dialog

Nov 24, 2019
Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang

This paper is a winner report from team MReaL-BDAI for Visual Dialog Challenge 2019. We present two causal principles for improving Visual Dialog (VisDial). By "improving", we mean that they can promote almost every existing VisDial model to the state-of-the-art performance on Visual Dialog 2019 Challenge leader-board. Such a major improvement is only due to our careful inspection on the causality behind the model and data, finding that the community has overlooked two causalities in VisDial. Intuitively, Principle 1 suggests: we should remove the direct input of the dialog history to the answer model, otherwise the harmful shortcut bias will be introduced; Principle 2 says: there is an unobserved confounder for history, question, and answer, leading to spurious correlations from training data. In particular, to remove the confounder suggested in Principle 2, we propose several causal intervention algorithms, which make the training fundamentally different from the traditional likelihood estimation. Note that the two principles are model-agnostic, so they are applicable in any VisDial model.

* Visual Dialog Challenge 2019 winner report 

  Click for Model/Code and Paper
SUM: Suboptimal Unitary Multi-task Learning Framework for Spatiotemporal Data Prediction

Oct 11, 2019
Qichen Li, Jiaxin Pei, Jianding Zhang, Bo Han

The typical multi-task learning methods for spatio-temporal data prediction involve low-rank tensor computation. However, such a method have relatively weak performance when the task number is small, and we cannot integrate it into non-linear models. In this paper, we propose a two-step suboptimal unitary method (SUM) to combine a meta-learning strategy into multi-task models. In the first step, it searches for a global pattern by optimising the general parameters with gradient descents under constraints, which is a geological regularizer to enable model learning with less training data. In the second step, we derive an optimised model on each specific task from the global pattern with only a few local training data. Compared with traditional multi-task learning methods, SUM shows advantages of generalisation ability on distant tasks. It can be applied on any multi-task models with the gradient descent as its optimiser regardless if the prediction function is linear or not. Moreover, we can harness the model to enable traditional prediction model to make coKriging. The experiments on public datasets have suggested that our framework, when combined with current multi-task models, has a conspicuously better prediction result when the task number is small compared to low-rank tensor learning, and our model has a quite satisfying outcome when adjusting the current prediction models for coKriging.

* 5 pages 

  Click for Model/Code and Paper
Functional Variational Bayesian Neural Networks

Mar 14, 2019
Shengyang Sun, Guodong Zhang, Jiaxin Shi, Roger Grosse

Variational Bayesian neural networks (BNNs) perform variational inference over weights, but it is difficult to specify meaningful priors and approximate posteriors in a high-dimensional weight space. We introduce functional variational Bayesian neural networks (fBNNs), which maximize an Evidence Lower BOund (ELBO) defined directly on stochastic processes, i.e. distributions over functions. We prove that the KL divergence between stochastic processes equals the supremum of marginal KL divergences over all finite sets of inputs. Based on this, we introduce a practical training objective which approximates the functional ELBO using finite measurement sets and the spectral Stein gradient estimator. With fBNNs, we can specify priors entailing rich structures, including Gaussian processes and implicit stochastic processes. Empirically, we find fBNNs extrapolate well using various structured priors, provide reliable uncertainty estimates, and scale to large datasets.

* ICLR 2019 

  Click for Model/Code and Paper
Understanding and Predicting the Memorability of Natural Scene Images

Oct 17, 2018
Jiaxin Lu, Mai Xu, Ren Yang, Zulin Wang

Memorability measures how easily an image is to be memorized after glancing, which may contribute to designing magazine covers, tourism publicity materials, and so forth. Recent works have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: "what exactly makes natural scene memorable". To this end, we first establish a large-scale natural scene image memorability (LNSIM) database, containing 2,632 natural scene images and their ground truth memorability scores. Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. We also find that deep feature is effective in predicting the memorability scores. Therefore, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of our DeepNSM, exceeding the state-of-the-art methods.

* arXiv admin note: substantial text overlap with arXiv:1808.08754 

  Click for Model/Code and Paper
What Makes Natural Scene Memorable?

Aug 27, 2018
Jiaxin Lu, Mai Xu, Ren Yang, Zulin Wang

Recent studies on image memorability have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: "what exactly makes natural scene memorable". Specifically, we first build LNSIM, a large-scale natural scene image memorability database (containing 2,632 images and memorability annotations). Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. Thus, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of DeepNSM.

* Accepted to ACM MM Workshops 

  Click for Model/Code and Paper
SO-Net: Self-Organizing Network for Point Cloud Analysis

Mar 27, 2018
Jiaxin Li, Ben M. Chen, Gim Hee Lee

This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is available at the project website.

* 17 pages, CVPR 2018 

  Click for Model/Code and Paper
Video Depth Estimation by Fusing Flow-to-Depth Proposals

Dec 30, 2019
Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li, Qifeng Chen

We present an approach with a novel differentiable flow-to-depth layer for video depth estimation. The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network. Given optical flow and camera pose, our flow-to-depth layer generates depth proposals and the corresponding confidence maps by explicitly solving an epipolar geometry optimization problem. Unlike other methods, our flow-to-depth layer is differentiable, and thus we can refine camera poses by maximizing the aggregated confidence in camera pose refinement module. Our depth fusion network can utilize depth proposals and their confidence maps inferred from different adjacent frames to produce the final depth map. Furthermore, the depth fusion network can additionally take the depth proposals generated by other methods to improve the results further. The experiments on three public datasets show that our approach outperforms state-of-the-art depth estimation methods, and has strong generalization capability: our model trained on KITTI performs well on the unseen Waymo dataset while other methods degenerate a lot.

  Click for Model/Code and Paper
QATM: Quality-Aware Template Matching For Deep Learning

Apr 09, 2019
Jiaxin Cheng, Yue Wu, Wael Abd-Almageed, Premkumar Natarajan

Finding a template in a search image is one of the core problems many computer vision, such as semantic image semantic, image-to-GPS verification \etc. We propose a novel quality-aware template matching method, QATM, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily embedded into any deep neural network. Specifically, we assess the quality of a matching pair using soft-ranking among all matching pairs, and thus different matching scenarios such as 1-to-1, 1-to-many, and many-to-many will be all reflected to different values. Our extensive evaluation on classic template matching benchmarks and deep learning tasks demonstrate the effectiveness of QATM. It not only outperforms state-of-the-art template matching methods when used alone, but also largely improves existing deep network solutions.

* Accepted as CVPR 2019 paper. Camera ready version 

  Click for Model/Code and Paper
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

Nov 02, 2019
Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang

The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters. This leads to a loss of important semantic information, which is especially problematic for Chinese because the language does not have explicit word boundaries. In this paper, we propose ZEN, a BERT-based Chinese (Z) text encoder Enhanced by N-gram representations, where different combinations of characters are considered during training. As a result, potential word or phase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT). Therefore ZEN incorporates the comprehensive information of both the character sequence and words or phrases it contains. Experimental results illustrated the effectiveness of ZEN on a series of Chinese NLP tasks. We show that ZEN, using less resource than other published encoders, can achieve state-of-the-art performance on most tasks. Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data. The code and pre-trained models of ZEN are available at

* Natural Language Processing. 11 pages, 7 figures 

  Click for Model/Code and Paper
Learning to Embed Sentences Using Attentive Recursive Trees

Nov 15, 2018
Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang

Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However, existing models have no explicit mechanism to emphasize task-informative words in the tree structure. To this end, we propose an Attentive Recursive Tree model (AR-Tree), where the words are dynamically located according to their importance in the task. Specifically, we construct the latent tree for a sentence in a proposed important-first strategy, and place more attentive words nearer to the root; thus, AR-Tree can inherently emphasize important words during the bottom-up composition of the sentence embedding. We propose an end-to-end reinforced training strategy for AR-Tree, which is demonstrated to consistently outperform, or be at least comparable to, the state-of-the-art sentence embedding methods on three sentence understanding tasks.

* AAAI Conference of Artificial Intelligence, 2019 

  Click for Model/Code and Paper