Models, code, and papers for "Jie Liu":

MLE-induced Likelihood for Markov Random Fields

Mar 27, 2018
Jie Liu, Hao Zheng

Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.

  Click for Model/Code and Paper
Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme under Weak Strong Convexity Assumption

May 05, 2017
Jie Liu, Martin Takac

We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a compact polyhedral set, which remains popular across machine learning community. Our PS2GD preserves the low-cost per iteration and high optimization accuracy via stochastic gradient variance-reduced technique, and admits a simple parallel implementation with mini-batches. Moreover, PS2GD is also applicable to dual problem of SVM with hinge loss.

  Click for Model/Code and Paper
D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems

Nov 28, 2019
Taoxing Pan, Jun Liu, Jie Wang

Decentralized optimization algorithms have attracted intensive interests recently, as it has a balanced communication pattern, especially when solving large-scale machine learning problems. Stochastic Path Integrated Differential Estimator Stochastic First-Order method (SPIDER-SFO) nearly achieves the algorithmic lower bound in certain regimes for nonconvex problems. However, whether we can find a decentralized algorithm which achieves a similar convergence rate to SPIDER-SFO is still unclear. To tackle this problem, we propose a decentralized variant of SPIDER-SFO, called decentralized SPIDER-SFO (D-SPIDER-SFO). We show that D-SPIDER-SFO achieves a similar gradient computation cost---that is, $\mathcal{O}(\epsilon^{-3})$ for finding an $\epsilon$-approximate first-order stationary point---to its centralized counterpart. To the best of our knowledge, D-SPIDER-SFO achieves the state-of-the-art performance for solving nonconvex optimization problems on decentralized networks in terms of the computational cost. Experiments on different network configurations demonstrate the efficiency of the proposed method.

  Click for Model/Code and Paper
Hyper-Path-Based Representation Learning for Hyper-Networks

Aug 24, 2019
Jie Huang, Xin Liu, Yangqiu Song

Network representation learning has aroused widespread interests in recent years. While most of the existing methods deal with edges as pairwise relations, only a few studies have been proposed for hyper-networks to capture more complicated tuplewise relationships among multiple nodes. A hyper-network is a network where each edge, called hyperedge, connects an arbitrary number of nodes. Different from conventional networks, hyper-networks have certain degrees of indecomposability such that the nodes in a subset of a hyperedge may not possess a strong relationship. That is the main reason why traditional algorithms fail in learning representations in hyper-networks by simply decomposing hyperedges into pairwise relationships. In this paper, we firstly define a metric to depict the degrees of indecomposability for hyper-networks. Then we propose a new concept called hyper-path and design hyper-path-based random walks to preserve the structural information of hyper-networks according to the analysis of the indecomposability. Then a carefully designed algorithm, Hyper-gram, utilizes these random walks to capture both pairwise relationships and tuplewise relationships in the whole hyper-networks. Finally, we conduct extensive experiments on several real-world datasets covering the tasks of link prediction and hyper-network reconstruction, and results demonstrate the rationality, validity, and effectiveness of our methods compared with those existing state-of-the-art models designed for conventional networks or hyper-networks.

* Accepted by CIKM 2019 

  Click for Model/Code and Paper
Heterogeneous Transfer Learning: An Unsupervised Approach

Oct 04, 2018
Feng Liu, Guanquan Zhang, Jie Lu

Transfer learning leverages the knowledge in one domain, the source domain, to improve learning efficiency in another domain, the target domain. Existing transfer learning research is relatively well-progressed, but only in situations where the feature spaces of the domains are homogeneous and the target domain contains at least a few labeled instances. However, transfer learning has not been well-studied in heterogeneous settings with an unlabeled target domain. To contribute to the research in this emerging field, this paper presents: (1) an unsupervised knowledge transfer theorem that prevents negative transfer; and (2) a principal angle-based metric to measure the distance between two pairs of domains. The metric shows the extent to which homogeneous representations have preserved the information in original source and target domains. The unsupervised knowledge transfer theorem sets out the transfer conditions necessary to prevent negative transfer. Linear monotonic maps meet the transfer conditions of the theorem and, hence, are used to construct homogeneous representations of the heterogeneous domains, which in principle prevents negative transfer. The metric and the theorem have been implemented in an innovative transfer model, called a Grassmann-LMM-geodesic flow kernel (GLG), that is specifically designed for knowledge transfer across heterogeneous domains. The GLG model learns homogeneous representations of heterogeneous domains by minimizing the proposed metric. Knowledge is transferred through these learned representations via a geodesic flow kernel. Notably, the theorem presented in this paper provides the sufficient transfer conditions needed to guarantee that knowledge is transferred from a source domain to an unlabeled target domain with correctness.

  Click for Model/Code and Paper
Robustness of classification ability of spiking neural networks

Jan 30, 2018
Jie Yang, Pingping Zhang, Yan Liu

It is well-known that the robustness of artificial neural networks (ANNs) is important for their wide ranges of applications. In this paper, we focus on the robustness of the classification ability of a spiking neural network which receives perturbed inputs. Actually, the perturbation is allowed to be arbitrary styles. However, Gaussian perturbation and other regular ones have been rarely investigated. For classification problems, the closer to the desired point, the more perturbed points there are in the input space. In addition, the perturbation may be periodic. Based on these facts, we only consider sinusoidal and Gaussian perturbations in this paper. With the SpikeProp algorithm, we perform extensive experiments on the classical XOR problem and other three benchmark datasets. The numerical results show that there is not significant reduction in the classification ability of the network if the input signals are subject to sinusoidal and Gaussian perturbations.

* Aceppted by Nonlinear Dynamics 

  Click for Model/Code and Paper
Indefinite Kernel Logistic Regression

Jul 06, 2017
Fanghui Liu, Xiaolin Huang, Jie Yang

Traditionally, kernel learning methods requires positive definitiveness on the kernel, which is too strict and excludes many sophisticated similarities, that are indefinite, in multimedia area. To utilize those indefinite kernels, indefinite learning methods are of great interests. This paper aims at the extension of the logistic regression from positive semi-definite kernels to indefinite kernels. The model, called indefinite kernel logistic regression (IKLR), keeps consistency to the regular KLR in formulation but it essentially becomes non-convex. Thanks to the positive decomposition of an indefinite matrix, IKLR can be transformed into a difference of two convex models, which follows the use of concave-convex procedure. Moreover, we employ an inexact solving scheme to speed up the sub-problem and develop a concave-inexact-convex procedure (CCICP) algorithm with theoretical convergence analysis. Systematical experiments on multi-modal datasets demonstrate the superiority of the proposed IKLR method over kernel logistic regression with positive definite kernels and other state-of-the-art indefinite learning based algorithms.

* Note that this is not the camera-ready version 

  Click for Model/Code and Paper
Real-Time Robot Localization, Vision, and Speech Recognition on Nvidia Jetson TX1

May 31, 2017
Jie Tang, Yong Ren, Shaoshan Liu

Robotics systems are complex, often consisted of basic services including SLAM for localization and mapping, Convolution Neural Networks for scene understanding, and Speech Recognition for user interaction, etc. Meanwhile, robots are mobile and usually have tight energy constraints, integrating these services onto an embedded platform with around 10 W of power consumption is critical to the proliferation of mobile robots. In this paper, we present a case study on integrating real-time localization, vision, and speech recognition services on a mobile SoC, Nvidia Jetson TX1, within about 10 W of power envelope. In addition, we explore whether offloading some of the services to cloud platform can lead to further energy efficiency while meeting the real-time requirements

* 12 pages, 8 figures 

  Click for Model/Code and Paper
On Bochner's and Polya's Characterizations of Positive-Definite Kernels and the Respective Random Feature Maps

Oct 27, 2016
Jie Chen, Dehua Cheng, Yan Liu

Positive-definite kernel functions are fundamental elements of kernel methods and Gaussian processes. A well-known construction of such functions comes from Bochner's characterization, which connects a positive-definite function with a probability distribution. Another construction, which appears to have attracted less attention, is Polya's criterion that characterizes a subset of these functions. In this paper, we study the latter characterization and derive a number of novel kernels little known previously. In the context of large-scale kernel machines, Rahimi and Recht (2007) proposed a random feature map (random Fourier) that approximates a kernel function, through independent sampling of the probability distribution in Bochner's characterization. The authors also suggested another feature map (random binning), which, although not explicitly stated, comes from Polya's characterization. We show that with the same number of random samples, the random binning map results in an Euclidean inner product closer to the kernel than does the random Fourier map. The superiority of the random binning map is confirmed empirically through regressions and classifications in the reproducing kernel Hilbert space.

  Click for Model/Code and Paper
Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual Entailment in NTCIR RITE Evaluation Tasks

Apr 08, 2015
Wei-Jie Huang, Chao-Lin Liu

We computed linguistic information at the lexical, syntactic, and semantic levels for Recognizing Inference in Text (RITE) tasks for both traditional and simplified Chinese in NTCIR-9 and NTCIR-10. Techniques for syntactic parsing, named-entity recognition, and near synonym recognition were employed, and features like counts of common words, statement lengths, negation words, and antonyms were considered to judge the entailment relationships of two statements, while we explored both heuristics-based functions and machine-learning approaches. The reported systems showed robustness by simultaneously achieving second positions in the binary-classification subtasks for both simplified and traditional Chinese in NTCIR-10 RITE-2. We conducted more experiments with the test data of NTCIR-9 RITE, with good results. We also extended our work to search for better configurations of our classifiers and investigated contributions of individual features. This extended work showed interesting results and should encourage further discussion.

* 20 pages, 1 figure, 26 tables, Journal article in Soft Computing (Spinger). Soft Computing, online. Springer, Germany, 2015 

  Click for Model/Code and Paper
Learning to Rank Binary Codes

Oct 21, 2014
Jie Feng, Wei Liu, Yan Wang

Binary codes have been widely used in vision problems as a compact feature representation to achieve both space and time advantages. Various methods have been proposed to learn data-dependent hash functions which map a feature vector to a binary code. However, considerable data information is inevitably lost during the binarization step which also causes ambiguity in measuring sample similarity using Hamming distance. Besides, the learned hash functions cannot be changed after training, which makes them incapable of adapting to new data outside the training data set. To address both issues, in this paper we propose a flexible bitwise weight learning framework based on the binary codes obtained by state-of-the-art hashing methods, and incorporate the learned weights into the weighted Hamming distance computation. We then formulate the proposed framework as a ranking problem and leverage the Ranking SVM model to offline tackle the weight learning. The framework is further extended to an online mode which updates the weights at each time new data comes, thereby making it scalable to large and dynamic data sets. Extensive experimental results demonstrate significant performance gains of using binary codes with bitwise weighting in image retrieval tasks. It is appealing that the online weight learning leads to comparable accuracy with its offline counterpart, which thus makes our approach practical for realistic applications.

  Click for Model/Code and Paper
A brief network analysis of Artificial Intelligence publication

Nov 23, 2013
Yunpeng Li, Jie Liu, Yong Deng

In this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish.

* 18 pages, 7 figures 

  Click for Model/Code and Paper
Efficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods

Jul 16, 2013
Jie Wang, Jun Liu, Jieping Ye

Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1q norm with q>1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due to the inherent structure of the mixed-norm regularization. Existing work deals with special cases with q=1, 2, infinity, and they cannot be easily extended to the general case. In this paper, we propose an efficient algorithm based on the accelerated gradient method for solving the general l1q-regularized problem. One key building block of the proposed algorithm is the l1q-regularized Euclidean projection (EP_1q). Our theoretical analysis reveals the key properties of EP_1q and illustrates why EP_1q for the general q is significantly more challenging to solve than the special cases. Based on our theoretical analysis, we develop an efficient algorithm for EP_1q by solving two zero finding problems. To further improve the efficiency of solving large dimensional mixed-norm regularized problems, we propose a screening method which is able to quickly identify the inactive groups, i.e., groups that have 0 components in the solution. This may lead to substantial reduction in the number of groups to be entered to the optimization. An appealing feature of our screening method is that the data set needs to be scanned only once to run the screening. Compared to that of solving the mixed-norm regularized problems, the computational cost of our screening test is negligible. The key of the proposed screening method is an accurate sensitivity analysis of the dual optimal solution when the regularization parameter varies. Experimental results demonstrate the efficiency of the proposed algorithm.

* arXiv admin note: text overlap with arXiv:1009.4766 

  Click for Model/Code and Paper
Learning to Predict More Accurate Text Instances for Scene Text Detection

Nov 18, 2019
XiaoQian Li, Jie Liu, ShuWu Zhang, GuiXuan Zhang

At present, multi-oriented text detection methods based on deep neural network have achieved promising performances on various benchmarks. Nevertheless, there are still some difficulties for arbitrary shape text detection, especially for a simple and proper representation of arbitrary shape text instances. In this paper, a pixel-based text detector is proposed to facilitate the representation and prediction of text instances with arbitrary shapes in a simple manner. Firstly, to alleviate the effect of the target vertex sorting and achieve the direct regression of arbitrary shape text instances, the starting-point-independent coordinates regression loss is proposed. Furthermore, to predict more accurate text instances, the text instance accuracy loss is proposed as an assistant task to refine the predicted coordinates under the guidance of IoU. To evaluate the effectiveness of our detector, extensive experiments have been carried on public benchmarks. On the ICDAR 2015 Incidental Scene Text benchmark, our method achieves 86.5% of F-measure, and we obtain 84.8% of F-measure on Total-Text benchmark. The results show that our method can reach state-of-the-art performance.

  Click for Model/Code and Paper
SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

Oct 02, 2019
Peter J. Liu, Yu-An Chung, Jie Ren

We propose an end-to-end neural model for zero-shot abstractive text summarization of paragraphs, and introduce a benchmark task, ROCSumm, based on ROCStories, a subset for which we collected human summaries. In this task, five-sentence stories (paragraphs) are summarized with one sentence, using human summaries only for evaluation. We show results for extractive and human baselines to demonstrate a large abstractive gap in performance. Our model, SummAE, consists of a denoising auto-encoder that embeds sentences and paragraphs in a common space, from which either can be decoded. Summaries for paragraphs are generated by decoding a sentence from the paragraph representations. We find that traditional sequence-to-sequence auto-encoders fail to produce good summaries and describe how specific architectural choices and pre-training techniques can significantly improve performance, outperforming extractive baselines. The data, training, evaluation code, and best model weights are open-sourced.

* first two authors contributed equally 

  Click for Model/Code and Paper
Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2

Jun 10, 2019
Jie Liu, Jiawen Liu, Wan Du, Dong Li

Training deep learning models on mobile devices recently becomes possible, because of increasing computation power on mobile hardware and the advantages of enabling high user experiences. Most of the existing work on machine learning at mobile devices is focused on the inference of deep learning models (particularly convolutional neural network and recurrent neural network), but not training. The performance characterization of training deep learning models on mobile devices is largely unexplored, although understanding the performance characterization is critical for designing and implementing deep learning models on mobile devices. In this paper, we perform a variety of experiments on a representative mobile device (the NVIDIA TX2) to study the performance of training deep learning models. We introduce a benchmark suite and tools to study performance of training deep learning models on mobile devices, from the perspectives of memory consumption, hardware utilization, and power consumption. The tools can correlate performance results with fine-grained operations in deep learning models, providing capabilities to capture performance variance and problems at a fine granularity. We reveal interesting performance problems and opportunities, including under-utilization of heterogeneous hardware, large energy consumption of the memory, and high predictability of workload characterization. Based on the performance analysis, we suggest interesting research directions.

  Click for Model/Code and Paper
TIGS: An Inference Algorithm for Text Infilling with Gradient Search

May 26, 2019
Dayiheng Liu, Jie Fu, Pengfei Liu, Jiancheng Lv

Text infilling is defined as a task for filling in the missing part of a sentence or paragraph, which is suitable for many real-world natural language generation scenarios. However, given a well-trained sequential generative model, generating missing symbols conditioned on the context is challenging for existing greedy approximate inference algorithms. In this paper, we propose an iterative inference algorithm based on gradient search, which is the first inference algorithm that can be broadly applied to any neural sequence generative models for text infilling tasks. We compare the proposed method with strong baselines on three text infilling tasks with various mask ratios and different mask strategies. The results show that our proposed method is effective and efficient for fill-in-the-blank tasks, consistently outperforming all baselines.

* The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019) 

  Click for Model/Code and Paper
P-MCGS: Parallel Monte Carlo Acyclic Graph Search

Oct 28, 2018
Chen Yu, Jianshu Chen, Jie Zhong, Ji Liu

Recently, there have been great interests in Monte Carlo Tree Search (MCTS) in AI research. Although the sequential version of MCTS has been studied widely, its parallel counterpart still lacks systematic study. This leads us to the following questions: \emph{how to design efficient parallel MCTS (or more general cases) algorithms with rigorous theoretical guarantee? Is it possible to achieve linear speedup?} In this paper, we consider the search problem on a more general acyclic one-root graph (namely, Monte Carlo Graph Search (MCGS)), which generalizes MCTS. We develop a parallel algorithm (P-MCGS) to assign multiple workers to investigate appropriate leaf nodes simultaneously. Our analysis shows that P-MCGS algorithm achieves linear speedup and that the sample complexity is comparable to its sequential counterpart.

  Click for Model/Code and Paper
Performing Co-Membership Attacks Against Deep Generative Models

Oct 04, 2018
Kin Sum Liu, Bo Li, Jie Gao

In this paper we propose new membership attacks and new attack methods against deep generative models including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Specifically, a membership attack is to check whether a given instance x was used in the training data or not. And a co-membership attack is to check whether the given bundle n instances were in the training, with the prior knowledge that the bundle was either entirely used in the training or none at all. Successful membership attacks can compromise privacy of training data when the generative model is published. Our main idea is to cast membership inference of target data x as the optimization of another neural network (called the attacker network) to search for the seed to reproduce x. The final reconstruction error is used directly to conclude whether x is in the training data or not. We show through experiments on a variety of data sets and a suite of training parameters that our attacker network can be more successful than prior membership attacks; co-membership attack can be more powerful than single attacks; and VAEs are more susceptible to membership attacks compared to GANs in general. We also discussed membership attack with model generalization, overfitting, and diversity of the model.

  Click for Model/Code and Paper
Bottom-up Pose Estimation of Multiple Person with Bounding Box Constraint

Jul 26, 2018
Miaopeng Li, Zimeng Zhou, Jie Li, Xinguo Liu

In this work, we propose a new method for multi-person pose estimation which combines the traditional bottom-up and the top-down methods. Specifically, we perform the network feed-forwarding in a bottom-up manner, and then parse the poses with bounding box constraints in a top-down manner. In contrast to the previous top-down methods, our method is robust to bounding box shift and tightness. We extract features from an original image by a residual network and train the network to learn both the confidence maps of joints and the connection relationships between joints. During testing, the predicted confidence maps, the connection relationships and the bounding boxes are used to parse the poses of all persons. The experimental results showed that our method learns more accurate human poses especially in challenging situations and gains better time performance, compared with the bottom-up and the top-down methods.

  Click for Model/Code and Paper