Research papers and code for "Jie Liu":
Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.

Click to Read Paper and Get Code
We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a compact polyhedral set, which remains popular across machine learning community. Our PS2GD preserves the low-cost per iteration and high optimization accuracy via stochastic gradient variance-reduced technique, and admits a simple parallel implementation with mini-batches. Moreover, PS2GD is also applicable to dual problem of SVM with hinge loss.

Click to Read Paper and Get Code
Transfer learning leverages the knowledge in one domain, the source domain, to improve learning efficiency in another domain, the target domain. Existing transfer learning research is relatively well-progressed, but only in situations where the feature spaces of the domains are homogeneous and the target domain contains at least a few labeled instances. However, transfer learning has not been well-studied in heterogeneous settings with an unlabeled target domain. To contribute to the research in this emerging field, this paper presents: (1) an unsupervised knowledge transfer theorem that prevents negative transfer; and (2) a principal angle-based metric to measure the distance between two pairs of domains. The metric shows the extent to which homogeneous representations have preserved the information in original source and target domains. The unsupervised knowledge transfer theorem sets out the transfer conditions necessary to prevent negative transfer. Linear monotonic maps meet the transfer conditions of the theorem and, hence, are used to construct homogeneous representations of the heterogeneous domains, which in principle prevents negative transfer. The metric and the theorem have been implemented in an innovative transfer model, called a Grassmann-LMM-geodesic flow kernel (GLG), that is specifically designed for knowledge transfer across heterogeneous domains. The GLG model learns homogeneous representations of heterogeneous domains by minimizing the proposed metric. Knowledge is transferred through these learned representations via a geodesic flow kernel. Notably, the theorem presented in this paper provides the sufficient transfer conditions needed to guarantee that knowledge is transferred from a source domain to an unlabeled target domain with correctness.

Click to Read Paper and Get Code
It is well-known that the robustness of artificial neural networks (ANNs) is important for their wide ranges of applications. In this paper, we focus on the robustness of the classification ability of a spiking neural network which receives perturbed inputs. Actually, the perturbation is allowed to be arbitrary styles. However, Gaussian perturbation and other regular ones have been rarely investigated. For classification problems, the closer to the desired point, the more perturbed points there are in the input space. In addition, the perturbation may be periodic. Based on these facts, we only consider sinusoidal and Gaussian perturbations in this paper. With the SpikeProp algorithm, we perform extensive experiments on the classical XOR problem and other three benchmark datasets. The numerical results show that there is not significant reduction in the classification ability of the network if the input signals are subject to sinusoidal and Gaussian perturbations.

* Aceppted by Nonlinear Dynamics
Click to Read Paper and Get Code
Traditionally, kernel learning methods requires positive definitiveness on the kernel, which is too strict and excludes many sophisticated similarities, that are indefinite, in multimedia area. To utilize those indefinite kernels, indefinite learning methods are of great interests. This paper aims at the extension of the logistic regression from positive semi-definite kernels to indefinite kernels. The model, called indefinite kernel logistic regression (IKLR), keeps consistency to the regular KLR in formulation but it essentially becomes non-convex. Thanks to the positive decomposition of an indefinite matrix, IKLR can be transformed into a difference of two convex models, which follows the use of concave-convex procedure. Moreover, we employ an inexact solving scheme to speed up the sub-problem and develop a concave-inexact-convex procedure (CCICP) algorithm with theoretical convergence analysis. Systematical experiments on multi-modal datasets demonstrate the superiority of the proposed IKLR method over kernel logistic regression with positive definite kernels and other state-of-the-art indefinite learning based algorithms.

* Note that this is not the camera-ready version
Click to Read Paper and Get Code
Robotics systems are complex, often consisted of basic services including SLAM for localization and mapping, Convolution Neural Networks for scene understanding, and Speech Recognition for user interaction, etc. Meanwhile, robots are mobile and usually have tight energy constraints, integrating these services onto an embedded platform with around 10 W of power consumption is critical to the proliferation of mobile robots. In this paper, we present a case study on integrating real-time localization, vision, and speech recognition services on a mobile SoC, Nvidia Jetson TX1, within about 10 W of power envelope. In addition, we explore whether offloading some of the services to cloud platform can lead to further energy efficiency while meeting the real-time requirements

* 12 pages, 8 figures
Click to Read Paper and Get Code
Positive-definite kernel functions are fundamental elements of kernel methods and Gaussian processes. A well-known construction of such functions comes from Bochner's characterization, which connects a positive-definite function with a probability distribution. Another construction, which appears to have attracted less attention, is Polya's criterion that characterizes a subset of these functions. In this paper, we study the latter characterization and derive a number of novel kernels little known previously. In the context of large-scale kernel machines, Rahimi and Recht (2007) proposed a random feature map (random Fourier) that approximates a kernel function, through independent sampling of the probability distribution in Bochner's characterization. The authors also suggested another feature map (random binning), which, although not explicitly stated, comes from Polya's characterization. We show that with the same number of random samples, the random binning map results in an Euclidean inner product closer to the kernel than does the random Fourier map. The superiority of the random binning map is confirmed empirically through regressions and classifications in the reproducing kernel Hilbert space.

Click to Read Paper and Get Code
We computed linguistic information at the lexical, syntactic, and semantic levels for Recognizing Inference in Text (RITE) tasks for both traditional and simplified Chinese in NTCIR-9 and NTCIR-10. Techniques for syntactic parsing, named-entity recognition, and near synonym recognition were employed, and features like counts of common words, statement lengths, negation words, and antonyms were considered to judge the entailment relationships of two statements, while we explored both heuristics-based functions and machine-learning approaches. The reported systems showed robustness by simultaneously achieving second positions in the binary-classification subtasks for both simplified and traditional Chinese in NTCIR-10 RITE-2. We conducted more experiments with the test data of NTCIR-9 RITE, with good results. We also extended our work to search for better configurations of our classifiers and investigated contributions of individual features. This extended work showed interesting results and should encourage further discussion.

* 20 pages, 1 figure, 26 tables, Journal article in Soft Computing (Spinger). Soft Computing, online. Springer, Germany, 2015
Click to Read Paper and Get Code
Binary codes have been widely used in vision problems as a compact feature representation to achieve both space and time advantages. Various methods have been proposed to learn data-dependent hash functions which map a feature vector to a binary code. However, considerable data information is inevitably lost during the binarization step which also causes ambiguity in measuring sample similarity using Hamming distance. Besides, the learned hash functions cannot be changed after training, which makes them incapable of adapting to new data outside the training data set. To address both issues, in this paper we propose a flexible bitwise weight learning framework based on the binary codes obtained by state-of-the-art hashing methods, and incorporate the learned weights into the weighted Hamming distance computation. We then formulate the proposed framework as a ranking problem and leverage the Ranking SVM model to offline tackle the weight learning. The framework is further extended to an online mode which updates the weights at each time new data comes, thereby making it scalable to large and dynamic data sets. Extensive experimental results demonstrate significant performance gains of using binary codes with bitwise weighting in image retrieval tasks. It is appealing that the online weight learning leads to comparable accuracy with its offline counterpart, which thus makes our approach practical for realistic applications.

Click to Read Paper and Get Code
In this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish.

* 18 pages, 7 figures
Click to Read Paper and Get Code
Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1q norm with q>1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due to the inherent structure of the mixed-norm regularization. Existing work deals with special cases with q=1, 2, infinity, and they cannot be easily extended to the general case. In this paper, we propose an efficient algorithm based on the accelerated gradient method for solving the general l1q-regularized problem. One key building block of the proposed algorithm is the l1q-regularized Euclidean projection (EP_1q). Our theoretical analysis reveals the key properties of EP_1q and illustrates why EP_1q for the general q is significantly more challenging to solve than the special cases. Based on our theoretical analysis, we develop an efficient algorithm for EP_1q by solving two zero finding problems. To further improve the efficiency of solving large dimensional mixed-norm regularized problems, we propose a screening method which is able to quickly identify the inactive groups, i.e., groups that have 0 components in the solution. This may lead to substantial reduction in the number of groups to be entered to the optimization. An appealing feature of our screening method is that the data set needs to be scanned only once to run the screening. Compared to that of solving the mixed-norm regularized problems, the computational cost of our screening test is negligible. The key of the proposed screening method is an accurate sensitivity analysis of the dual optimal solution when the regularization parameter varies. Experimental results demonstrate the efficiency of the proposed algorithm.

* arXiv admin note: text overlap with arXiv:1009.4766
Click to Read Paper and Get Code
Training deep learning models on mobile devices recently becomes possible, because of increasing computation power on mobile hardware and the advantages of enabling high user experiences. Most of the existing work on machine learning at mobile devices is focused on the inference of deep learning models (particularly convolutional neural network and recurrent neural network), but not training. The performance characterization of training deep learning models on mobile devices is largely unexplored, although understanding the performance characterization is critical for designing and implementing deep learning models on mobile devices. In this paper, we perform a variety of experiments on a representative mobile device (the NVIDIA TX2) to study the performance of training deep learning models. We introduce a benchmark suite and tools to study performance of training deep learning models on mobile devices, from the perspectives of memory consumption, hardware utilization, and power consumption. The tools can correlate performance results with fine-grained operations in deep learning models, providing capabilities to capture performance variance and problems at a fine granularity. We reveal interesting performance problems and opportunities, including under-utilization of heterogeneous hardware, large energy consumption of the memory, and high predictability of workload characterization. Based on the performance analysis, we suggest interesting research directions.

Click to Read Paper and Get Code
Text infilling is defined as a task for filling in the missing part of a sentence or paragraph, which is suitable for many real-world natural language generation scenarios. However, given a well-trained sequential generative model, generating missing symbols conditioned on the context is challenging for existing greedy approximate inference algorithms. In this paper, we propose an iterative inference algorithm based on gradient search, which is the first inference algorithm that can be broadly applied to any neural sequence generative models for text infilling tasks. We compare the proposed method with strong baselines on three text infilling tasks with various mask ratios and different mask strategies. The results show that our proposed method is effective and efficient for fill-in-the-blank tasks, consistently outperforming all baselines.

* The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)
Click to Read Paper and Get Code
Recently, there have been great interests in Monte Carlo Tree Search (MCTS) in AI research. Although the sequential version of MCTS has been studied widely, its parallel counterpart still lacks systematic study. This leads us to the following questions: \emph{how to design efficient parallel MCTS (or more general cases) algorithms with rigorous theoretical guarantee? Is it possible to achieve linear speedup?} In this paper, we consider the search problem on a more general acyclic one-root graph (namely, Monte Carlo Graph Search (MCGS)), which generalizes MCTS. We develop a parallel algorithm (P-MCGS) to assign multiple workers to investigate appropriate leaf nodes simultaneously. Our analysis shows that P-MCGS algorithm achieves linear speedup and that the sample complexity is comparable to its sequential counterpart.

Click to Read Paper and Get Code
In this paper we propose new membership attacks and new attack methods against deep generative models including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs). Specifically, a membership attack is to check whether a given instance x was used in the training data or not. And a co-membership attack is to check whether the given bundle n instances were in the training, with the prior knowledge that the bundle was either entirely used in the training or none at all. Successful membership attacks can compromise privacy of training data when the generative model is published. Our main idea is to cast membership inference of target data x as the optimization of another neural network (called the attacker network) to search for the seed to reproduce x. The final reconstruction error is used directly to conclude whether x is in the training data or not. We show through experiments on a variety of data sets and a suite of training parameters that our attacker network can be more successful than prior membership attacks; co-membership attack can be more powerful than single attacks; and VAEs are more susceptible to membership attacks compared to GANs in general. We also discussed membership attack with model generalization, overfitting, and diversity of the model.

Click to Read Paper and Get Code
In this work, we propose a new method for multi-person pose estimation which combines the traditional bottom-up and the top-down methods. Specifically, we perform the network feed-forwarding in a bottom-up manner, and then parse the poses with bounding box constraints in a top-down manner. In contrast to the previous top-down methods, our method is robust to bounding box shift and tightness. We extract features from an original image by a residual network and train the network to learn both the confidence maps of joints and the connection relationships between joints. During testing, the predicted confidence maps, the connection relationships and the bounding boxes are used to parse the poses of all persons. The experimental results showed that our method learns more accurate human poses especially in challenging situations and gains better time performance, compared with the bottom-up and the top-down methods.

Click to Read Paper and Get Code
Autonomous vehicle safety and reliability are the paramount requirements when developing autonomous vehicles. These requirements are guaranteed by massive functional and performance tests. Conducting these tests on real vehicles is extremely expensive and time consuming, and thus it is imperative to develop a simulation platform to perform these tasks. For simulation, we can utilize the Robot Operating System (ROS) for data playback to test newly developed algorithms. However, due to the massive amount of simulation data, performing simulation on single machines is not practical. Hence, a high-performance distributed simulation platform is a critical piece in autonomous driving development. In this paper we present our experiences of building a production distributed autonomous driving simulation platform. This platform is built upon Spark distributed framework, for distributed computing management, and ROS, for data playback simulations.

* 12 pages, 7 figures
Click to Read Paper and Get Code
Decision-Making Trial and Evaluation Laboratory (DEMATEL) method is widely used in many real applications. With the desirable property of efficient handling with the uncertain information in decision making, the fuzzy DEMATEL is heavily studied. Recently, Dytczak and Ginda suggested to defuzzify the fuzzy numbers firstly and then use the classical DEMATEL to obtain the final result. In this short paper, we show that it is not reasonable in some situations. The results of defuzzification at the first step are not coincide with the results of defuzzification at the final step.It seems that the alternative is to defuzzification in the final step in fuzzy DEMATEL.

Click to Read Paper and Get Code
We introduce a novel single-shot object detector to ease the imbalance of foreground-background class by suppressing the easy negatives while increasing the positives. To achieve this, we propose an Anchor Promotion Module (APM) which predicts the probability of each anchor as positive and adjusts their initial locations and shapes to promote both the quality and quantity of positive anchors. In addition, we design an efficient Feature Alignment Module (FAM) to extract aligned features for fitting the promoted anchors with the help of both the location and shape transformation information from the APM. We assemble the two proposed modules to the backbone of VGG-16 and ResNet-101 network with an encoder-decoder architecture. Extensive experiments on MS COCO well demonstrate our model performs competitively with alternative methods (40.0\% mAP on \textit{test-dev} set) and runs faster (28.6 \textit{fps}).

* Submitted to a conference, under review
Click to Read Paper and Get Code
General detectors follow the pipeline that feature maps extracted from ConvNets are shared between classification and regression tasks. However, there exists obvious conflicting requirements in multi-orientation object detection that classification is insensitive to orientations, while regression is quite sensitive. To address this issue, we provide an Encoder-Decoder architecture, called Rotated Feature Network (RFN), which produces rotation-sensitive feature maps (RS) for regression and rotation-invariant feature maps (RI) for classification. Specifically, the Encoder unit assigns weights for rotated feature maps. The Decoder unit extracts RS and RI by performing resuming operator on rotated and reweighed feature maps, respectively. To make the rotation-invariant characteristics more reliable, we adopt a metric to quantitatively evaluate the rotation-invariance by adding a constrain item in the loss, yielding a promising detection performance. Compared with the state-of-the-art methods, our method can achieve significant improvement on NWPU VHR-10 and RSOD datasets. We further evaluate the RFN on the scene classification in remote sensing images and object detection in natural images, demonstrating its good generalization ability. The proposed RFN can be integrated into an existing framework, leading to great performance with only a slight increase in model complexity.

* 9 pages, 7 figures
Click to Read Paper and Get Code