Models, code, and papers for "Jun Xing":

Sparse Topical Coding

Feb 14, 2012
Jun Zhu, Eric P. Xing

We present sparse topical coding (STC), a non-probabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic topic models, STC relaxes the normalization constraint of admixture proportions and the constraint of defining a normalized likelihood function. Such relaxations make STC amenable to: 1) directly control the sparsity of inferred representations by using sparsity-inducing regularizers; 2) be seamlessly integrated with a convex error function (e.g., SVM hinge loss) for supervised learning; and 3) be efficiently learned with a simply structured coordinate descent algorithm. Our results demonstrate the advantages of STC and supervised MedSTC on identifying topical meanings of words and improving classification accuracy and time efficiency.


  Click for Model/Code and Paper
Maximum Entropy Discrimination Markov Networks

Jan 18, 2009
Jun Zhu, Eric P. Xing

In this paper, we present a novel and general framework called {\it Maximum Entropy Discrimination Markov Networks} (MaxEnDNet), which integrates the max-margin structured learning and Bayesian-style estimation and combines and extends their merits. Major innovations of this model include: 1) It generalizes the extant Markov network prediction rule based on a point estimator of weights to a Bayesian-style estimator that integrates over a learned distribution of the weights. 2) It extends the conventional max-entropy discrimination learning of classification rule to a new structural max-entropy discrimination paradigm of learning the distribution of Markov networks. 3) It subsumes the well-known and powerful Maximum Margin Markov network (M$^3$N) as a special case, and leads to a model similar to an $L_1$-regularized M$^3$N that is simultaneously primal and dual sparse, or other types of Markov network by plugging in different prior distributions of the weights. 4) It offers a simple inference algorithm that combines existing variational inference and convex-optimization based M$^3$N solvers as subroutines. 5) It offers a PAC-Bayesian style generalization bound. This work represents the first successful attempt to combine Bayesian-style learning (based on generative models) with structured maximum margin learning (based on a discriminative model), and outperforms a wide array of competing methods for structured input/output learning on both synthetic and real data sets.

* Journal of Machine Learning Research, 10(Nov):2531-2569, 2009 
* 39 pages 

  Click for Model/Code and Paper
Diversity-Promoting Bayesian Learning of Latent Variable Models

Nov 23, 2017
Pengtao Xie, Jun Zhu, Eric P. Xing

To address three important issues involved in latent variable models (LVMs), including capturing infrequent patterns, achieving small-sized but expressive models and alleviating overfitting, several studies have been devoted to "diversifying" LVMs, which aim at encouraging the components in LVMs to be diverse. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to "diversify" LVMs in the paradigm of Bayesian learning. We propose two approaches that have complementary advantages. One is to define a diversity-promoting mutual angular prior which assigns larger density to components with larger mutual angles and use this prior to affect the posterior via Bayes' rule. We develop two efficient approximate posterior inference algorithms based on variational inference and MCMC sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. We also extend our approach to "diversify" Bayesian nonparametric models where the number of components is infinite. A sampling algorithm based on slice sampling and Hamiltonian Monte Carlo is developed. We apply these methods to "diversify" Bayesian mixture of experts model and infinite latent feature model. Experiments on various datasets demonstrate the effectiveness and efficiency of our methods.


  Click for Model/Code and Paper
Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs

Feb 12, 2014
Jun Zhu, Ning Chen, Eric P. Xing

Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.

* 49 pages, 11 figures 

  Click for Model/Code and Paper
MedLDA: A General Framework of Maximum Margin Supervised Topic Models

Dec 30, 2009
Jun Zhu, Amr Ahmed, Eric P. Xing

Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of max-margin supervised topic models for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction tasks. The general principle of MedLDA can be applied to perform joint max-margin learning and maximum likelihood estimation for arbitrary topic models, directed or undirected, and supervised or unsupervised, when the supervised side information is available. We develop efficient variational methods for posterior inference and parameter estimation, and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets.

* Journal of Machine Learning Research, 13(Aug): 2237--2278, 2012 
* 27 Pages 

  Click for Model/Code and Paper
2DR1-PCA and 2DL1-PCA: two variant 2DPCA algorithms based on none L2 norm

Dec 23, 2019
Xing Liu, Xiao-Jun Wu, Zi-Qi Li

In this paper, two novel methods: 2DR1-PCA and 2DL1-PCA are proposed for face recognition. Compared to the traditional 2DPCA algorithm, 2DR1-PCA and 2DL1-PCA are based on the R1 norm and L1 norm, respectively. The advantage of these proposed methods is they are less sensitive to outliers. These proposed methods are tested on the ORL, YALE and XM2VTS databases and the performance of the related methods is compared experimentally.

* 15 pages, 4 figures 

  Click for Model/Code and Paper
A Compared Study Between Some Subspace Based Algorithms

Dec 23, 2019
Xing Liu, Xiao-Jun Wu, Zhen Liu, He-Feng Yin

The technology of face recognition has made some progress in recent years. After studying the PCA, 2DPCA, R1-PCA, L1-PCA, KPCA and KECA algorithms, in this paper ECA (2DECA) is proposed by extracting features in PCA (2DPCA) based on Renyi entropy contribution. And then we conduct a study on the 2DL1-PCA and 2DR1-PCA algorithms. On the basis of the experiments, this paper compares the difference of the recognition accuracy and operational efficiency between the above algorithms.

* 13 pages, 5 figures 

  Click for Model/Code and Paper
SeDMiD for Confusion Detection: Uncovering Mind State from Time Series Brain Wave Data

Nov 29, 2016
Jingkang Yang, Haohan Wang, Jun Zhu, Eric P. Xing

Understanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.

* 11 pages, 2 figures, NIPS 2016 Time Series Workshop 

  Click for Model/Code and Paper
Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement Learning

Feb 22, 2020
Jun Li, Zhichao Xing, Weibin Zhang, Yan Lin, Feng Shu

Vehicle tracking has become one of the key applications of wireless sensor networks (WSNs) in the fields of rescue, surveillance, traffic monitoring, etc. However, the increased tracking accuracy requires more energy consumption. In this letter, a decentralized vehicle tracking strategy is conceived for improving both tracking accuracy and energy saving, which is based on adjusting the intersection area between the fixed sensing area and the dynamic activation area. Then, two deep reinforcement learning (DRL) aided solutions are proposed relying on the dynamic selection of the activation area radius. Finally, simulation results show the superiority of our DRL aided design.


  Click for Model/Code and Paper
Low-Rank Phase Retrieval via Variational Bayesian Learning

Nov 05, 2018
Kaihui Liu, Jiayi Wang, Zhengli Xing, Linxiao Yang, Jun Fang

In this paper, we consider the problem of low-rank phase retrieval whose objective is to estimate a complex low-rank matrix from magnitude-only measurements. We propose a hierarchical prior model for low-rank phase retrieval, in which a Gaussian-Wishart hierarchical prior is placed on the underlying low-rank matrix to promote the low-rankness of the matrix. Based on the proposed hierarchical model, a variational expectation-maximization (EM) algorithm is developed. The proposed method is less sensitive to the choice of the initialization point and works well with random initialization. Simulation results are provided to illustrate the effectiveness of the proposed algorithm.


  Click for Model/Code and Paper
Simultaneous Block-Sparse Signal Recovery Using Pattern-Coupled Sparse Bayesian Learning

Nov 06, 2017
Hang Xiao, Zhengli Xing, Linxiao Yang, Jun Fang, Yanlun Wu

In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between neighboring coefficients of the common row sparsity MMV signals. Unlike many other methods, the proposed method is able to automatically capture the block sparse structure of the unknown signal. Our method is developed using an expectation-maximization (EM) framework. Simulation results show that our proposed method offers competitive performance in recovering block-sparse common row sparsity pattern MMV signals.


  Click for Model/Code and Paper
Multiple Independent Subspace Clusterings

May 10, 2019
Xing Wang, Jun Wang, Carlotta Domeniconi, Guoxian Yu, Guoqiang Xiao, Maozu Guo

Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it's still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i.e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.

* AAAI2019 

  Click for Model/Code and Paper
Multi-View Multi-Instance Multi-Label Learning based on Collaborative Matrix Factorization

May 15, 2019
Yuying Xing, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Zili Zhang, Maozu Guo

Multi-view Multi-instance Multi-label Learning(M3L) deals with complex objects encompassing diverse instances, represented with different feature views, and annotated with multiple labels. Existing M3L solutions only partially explore the inter or intra relations between objects (or bags), instances, and labels, which can convey important contextual information for M3L. As such, they may have a compromised performance. In this paper, we propose a collaborative matrix factorization based solution called M3Lcmf. M3Lcmf first uses a heterogeneous network composed of nodes of bags, instances, and labels, to encode different types of relations via multiple relational data matrices. To preserve the intrinsic structure of the data matrices, M3Lcmf collaboratively factorizes them into low-rank matrices, explores the latent relationships between bags, instances, and labels, and selectively merges the data matrices. An aggregation scheme is further introduced to aggregate the instance-level labels into bag-level and to guide the factorization. An empirical study on benchmark datasets show that M3Lcmf outperforms other related competitive solutions both in the instance-level and bag-level prediction.

* 8 pages, 8 figures, uses aaai19.sty, accepted to AAAI2019 

  Click for Model/Code and Paper
Deep RBFNet: Point Cloud Feature Learning using Radial Basis Functions

Dec 11, 2018
Weikai Chen, Xiaoguang Han, Guanbin Li, Chao Chen, Jun Xing, Yajie Zhao, Hao Li

Three-dimensional object recognition has recently achieved great progress thanks to the development of effective point cloud-based learning frameworks, such as PointNet and its extensions. However, existing methods rely heavily on fully connected layers, which introduce a significant amount of parameters, making the network harder to train and prone to overfitting problems. In this paper, we propose a simple yet effective framework for point set feature learning by leveraging a nonlinear activation layer encoded by Radial Basis Function (RBF) kernels. Unlike PointNet variants, that fail to recognize local point patterns, our approach explicitly models the spatial distribution of point clouds by aggregating features from sparsely distributed RBF kernels. A typical RBF kernel, e.g. Gaussian function, naturally penalizes long-distance response and is only activated by neighboring points. Such localized response generates highly discriminative features given different point distributions. In addition, our framework allows the joint optimization of kernel distribution and its receptive field, automatically evolving kernel configurations in an end-to-end manner. We demonstrate that the proposed network with a single RBF layer can outperform the state-of-the-art Pointnet++ in terms of classification accuracy for 3D object recognition tasks. Moreover, the introduction of nonlinear mappings significantly reduces the number of network parameters and computational cost, enabling significantly faster training and a deployable point cloud recognition solution on portable devices with limited resources.

* Technical Report 

  Click for Model/Code and Paper
Representation based and Attention augmented Meta learning

Nov 26, 2018
Yunxiao Qin, Chenxu Zhao, Zezheng Wang, Junliang Xing, Jun Wan, Zhen Lei

Deep learning based computer vision fails to work when labeled images are scarce. Recently, Meta learning algorithm has been confirmed as a promising way to improve the ability of learning from few images for computer vision. However, previous Meta learning approaches expose problems: 1) they ignored the importance of attention mechanism for the Meta learner; 2) they didn't give the Meta learner the ability of well using the past knowledge which can help to express images into high representations, resulting in that the Meta learner has to solve few shot learning task directly from the original high dimensional RGB images. In this paper, we argue that the attention mechanism and the past knowledge are crucial for the Meta learner, and the Meta learner should be trained on high representations of the RGB images instead of directly on the original ones. Based on these arguments, we propose two methods: Attention augmented Meta Learning (AML) and Representation based and Attention augmented Meta Learning(RAML). The method AML aims to improve the Meta learner's attention ability by explicitly embedding an attention model into its network. The method RAML aims to give the Meta learner the ability of leveraging the past learned knowledge to reduce the dimension of the original input data by expressing it into high representations, and help the Meta learner to perform well. Extensive experiments demonstrate the effectiveness of the proposed models, with state-of-the-art few shot learning performances on several few shot learning benchmarks. The source code of our proposed methods will be released soon to facilitate further studies on those aforementioned problem.

* 10 pages, 6 figures 

  Click for Model/Code and Paper
Minimax Nonparametric Two-sample Test

Nov 08, 2019
Xin Xing, Zuofeng Shang, Pang Du, Ping Ma, Wenxuan Zhong, Jun S. Liu

We consider the problem of comparing probability densities between two groups. To model the complex pattern of the underlying densities, we formulate the problem as a nonparametric density hypothesis testing problem. The major difficulty is that conventional tests may fail to distinguish the alternative from the null hypothesis under the controlled type I error. In this paper, we model log-transformed densities in a tensor product reproducing kernel Hilbert space (RKHS) and propose a probabilistic decomposition of this space. Under such a decomposition, we quantify the difference of the densities between two groups by the component norm in the probabilistic decomposition. Based on the Bernstein width, a sharp minimax lower bound of the distinguishable rate is established for the nonparametric two-sample test. We then propose a penalized likelihood ratio (PLR) test possessing the Wilks' phenomenon with an asymptotically Chi-square distributed test statistic and achieving the established minimax testing rate. Simulations and real applications demonstrate that the proposed test outperforms the conventional approaches under various scenarios.

* 56 pages 

  Click for Model/Code and Paper
Learning Clustered Representation for Complex Free Energy Landscapes

Jun 07, 2019
Jun Zhang, Yao-Kun Lei, Xing Che, Zhen Zhang, Yi Isaac Yang, Yi Qin Gao

In this paper we first analyzed the inductive bias underlying the data scattered across complex free energy landscapes (FEL), and exploited it to train deep neural networks which yield reduced and clustered representation for the FEL. Our parametric method, called Information Distilling of Metastability (IDM), is end-to-end differentiable thus scalable to ultra-large dataset. IDM is also a clustering algorithm and is able to cluster the samples in the meantime of reducing the dimensions. Besides, as an unsupervised learning method, IDM differs from many existing dimensionality reduction and clustering methods in that it neither requires a cherry-picked distance metric nor the ground-true number of clusters, and that it can be used to unroll and zoom-in the hierarchical FEL with respect to different timescales. Through multiple experiments, we show that IDM can achieve physically meaningful representations which partition the FEL into well-defined metastable states hence are amenable for downstream tasks such as mechanism analysis and kinetic modeling.

* 6 figures, 1 table in the main text 

  Click for Model/Code and Paper
Spatial-Temporal Transformer Networks for Traffic Flow Forecasting

Jan 09, 2020
Mingxing Xu, Wenrui Dai, Chunmiao Liu, Xing Gao, Weiyao Lin, Guo-Jun Qi, Hongkai Xiong

Traffic forecasting has emerged as a core component of intelligent transportation systems. However, timely accurate traffic forecasting, especially long-term forecasting, still remains an open challenge due to the highly nonlinear and dynamic spatial-temporal dependencies of traffic flows. In this paper, we propose a novel paradigm of Spatial-Temporal Transformer Networks (STTNs) that leverages dynamical directed spatial dependencies and long-range temporal dependencies to improve the accuracy of long-term traffic forecasting. Specifically, we present a new variant of graph neural networks, named spatial transformer, by dynamically modeling directed spatial dependencies with self-attention mechanism to capture realtime traffic conditions as well as the directionality of traffic flows. Furthermore, different spatial dependency patterns can be jointly modeled with multi-heads attention mechanism to consider diverse relationships related to different factors (e.g. similarity, connectivity and covariance). On the other hand, the temporal transformer is utilized to model long-range bidirectional temporal dependencies across multiple time steps. Finally, they are composed as a block to jointly model the spatial-temporal dependencies for accurate traffic prediction. Compared to existing works, the proposed model enables fast and scalable training over a long range spatial-temporal dependencies. Experiment results demonstrate that the proposed model achieves competitive results compared with the state-of-the-arts, especially forecasting long-term traffic flows on real-world PeMS-Bay and PeMSD7(M) datasets.


  Click for Model/Code and Paper
Structured Generative Adversarial Networks

Nov 02, 2017
Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P. Xing

We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer.

* To appear in NIPS 2017 

  Click for Model/Code and Paper
Quantization Networks

Nov 28, 2019
Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xiansheng Hua

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network into a low-bitwidth integer version, has been an active and promising research topic. Existing methods formulate the low-bit quantization of networks as an approximation or optimization problem. Approximation-based methods confront the gradient mismatch problem, while optimization-based methods are only suitable for quantizing weights and could introduce high computational cost in the training stage. In this paper, we propose a novel perspective of interpreting and implementing neural network quantization by formulating low-bit quantization as a differentiable non-linear function (termed quantization function). The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way. Extensive experiments on image classification and object detection tasks show that our quantization networks outperform the state-of-the-art methods. We believe that the proposed method will shed new insights on the interpretation of neural network quantization. Our code is available at https://github.com/aliyun/alibabacloud-quantization-networks.

* 10 pages, CVPR2019 

  Click for Model/Code and Paper