Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost for big data. In this paper, we propose a new Bayesian approach, EigenGP, that learns both basis dictionary elements--eigenfunctions of a GP prior--and prior precisions in a sparse finite model. It is well known that, among all orthogonal basis functions, eigenfunctions can provide the most compact representation. Unlike other sparse Bayesian finite models where the basis function has a fixed form, our eigenfunctions live in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. We learn the dictionary elements--eigenfunctions--and the prior precisions over these elements as well as all the other hyperparameters from data by maximizing the model marginal likelihood. We explore computational linear algebra to simplify the gradient computation significantly. Our experimental results demonstrate improved predictive performance of EigenGP over alternative sparse GP methods as well as relevance vector machine. Click to Read Paper
Bayesian learning is often hampered by large computational expense. As a powerful generalization of popular belief propagation, expectation propagation (EP) efficiently approximates the exact Bayesian computation. Nevertheless, EP can be sensitive to outliers and suffer from divergence for difficult cases. To address this issue, we propose a new approximate inference approach, relaxed expectation propagation (REP). It relaxes the moment matching requirement of expectation propagation by adding a relaxation factor into the KL minimization. We penalize this relaxation with a $l_1$ penalty. As a result, when two distributions in the relaxed KL divergence are similar, the relaxation factor will be penalized to zero and, therefore, we obtain the original moment matching; In the presence of outliers, these two distributions are significantly different and the relaxation factor will be used to reduce the contribution of the outlier. Based on this penalized KL minimization, REP is robust to outliers and can greatly improve the posterior approximation quality over EP. To examine the effectiveness of REP, we apply it to Gaussian process classification, a task known to be suitable to EP. Our classification results on synthetic and UCI benchmark datasets demonstrate significant improvement of REP over EP and Power EP--in terms of algorithmic stability, estimation accuracy and predictive performance. Click to Read Paper
It is a challenging task to select correlated variables in a high dimensional space. To address this challenge, the elastic net has been developed and successfully applied to many applications. Despite its great success, the elastic net does not explicitly use correlation information embedded in data to select correlated variables. To overcome this limitation, we present a novel Bayesian hybrid model, the EigenNet, that uses the eigenstructures of data to guide variable selection. Specifically, it integrates a sparse conditional classification model with a generative model capturing variable correlations in a principled Bayesian framework. We reparameterize the hybrid model in the eigenspace to avoid overfiting and to increase the computational efficiency of its MCMC sampler. Furthermore, we provide an alternative view to the EigenNet from a regularization perspective: the EigenNet has an adaptive eigenspace-based composite regularizer, which naturally generalizes the $l_{1/2}$ regularizer used by the elastic net. Experiments on synthetic and real data show that the EigenNet significantly outperforms the lasso, the elastic net, and the Bayesian lasso in terms of prediction accuracy, especially when the number of training samples is smaller than the number of variables. Click to Read Paper
It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap that makes it challenging to correlate such heterogeneous data. Generative adversarial networks (GANs) have shown its strong ability of modeling data distribution and learning discriminative representation, existing GANs-based works mainly focus on generative problem to generate new data. We have different goal, aim to correlate heterogeneous data, by utilizing the power of GANs to model cross-modal joint distribution. Thus, we propose Cross-modal GANs to learn discriminative common representation for bridging heterogeneity gap. The main contributions are: (1) Cross-modal GANs architecture is proposed to model joint distribution over data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both of them beat each other to promote cross-modal correlation learning. (2) Cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form generative model. They can not only exploit cross-modal correlation for learning common representation, but also preserve reconstruction information for capturing semantic consistency within each modality. (3) Cross-modal adversarial mechanism is proposed, which utilizes two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make common representation more discriminative by adversarial training process. To the best of our knowledge, our proposed CM-GANs approach is the first to utilize GANs to perform cross-modal common representation learning. Experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval paradigm, compared with 10 methods on 3 cross-modal datasets. Click to Read Paper
Nowadays, cross-modal retrieval plays an indispensable role to flexibly find information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities such as image and text have imbalanced and complementary relationships, which contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on Deep Neural Network (DNN) mostly construct one common space for different modalities to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Different from the existing works, we propose modality-specific cross-modal similarity measurement (MCSM) approach by constructing independent semantic space for each modality, which adopts end-to-end framework to directly generate modality-specific cross-modal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding to utilize the learned attention weights for guiding the fine-grained cross-modal correlation learning, which can capture the imbalanced and complementary relationships between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform cross-modal retrieval. Experiments on the widely-used Wikipedia and Pascal Sentence datasets as well as our constructed large-scale XMediaNet dataset verify the effectiveness of our proposed approach, outperforming 9 state-of-the-art methods. Click to Read Paper
Gaussian processes (GPs) are powerful non-parametric function estimators. However, their applications are largely limited by the expensive computational cost of the inference procedures. Existing stochastic or distributed synchronous variational inferences, although have alleviated this issue by scaling up GPs to millions of samples, are still far from satisfactory for real-world large applications, where the data sizes are often orders of magnitudes larger, say, billions. To solve this problem, we propose ADVGP, the first Asynchronous Distributed Variational Gaussian Process inference for regression, on the recent large-scale machine learning platform, PARAMETERSERVER. ADVGP uses a novel, flexible variational framework based on a weight space augmentation, and implements the highly efficient, asynchronous proximal gradient optimization. While maintaining comparable or better predictive performance, ADVGP greatly improves upon the efficiency of the existing variational methods. With ADVGP, we effortlessly scale up GP regression to a real-world application with billions of samples and demonstrate an excellent, superior prediction accuracy to the popular linear models. Click to Read Paper
Given genetic variations and various phenotypical traits, such as Magnetic Resonance Imaging (MRI) features, we consider two important and related tasks in biomedical research: i)to select genetic and phenotypical markers for disease diagnosis and ii) to identify associations between genetic and phenotypical data. These two tasks are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. While a variety of sparse models have been applied for disease diagnosis and canonical correlation analysis and its extensions have bee widely used in association studies (e.g., eQTL analysis), these two tasks have been treated separately. To unify these two tasks, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset for the study of Alzheimer's Disease (AD). Our method identifies biologically meaningful relationships between genetic variations, MRI features, and AD status, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods. Click to Read Paper
Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost and it is difficult to design nonstationary GP priors in practice. In this paper, we propose a sparse Gaussian process model, EigenGP, based on the Karhunen-Loeve (KL) expansion of a GP prior. We use the Nystrom approximation to obtain data dependent eigenfunctions and select these eigenfunctions by evidence maximization. This selection reduces the number of eigenfunctions in our model and provides a nonstationary covariance function. To handle nonlinear likelihoods, we develop an efficient expectation propagation (EP) inference algorithm, and couple it with expectation maximization for eigenfunction selection. Because the eigenfunctions of a Gaussian kernel are associated with clusters of samples - including both the labeled and unlabeled - selecting relevant eigenfunctions enables EigenGP to conduct semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over alternative state-of-the-art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification. Click to Read Paper
We face network data from various sources, such as protein interactions and online social networks. A critical problem is to model network interactions and identify latent groups of network nodes. This problem is challenging due to many reasons. For example, the network nodes are interdependent instead of independent of each other, and the data are known to be very noisy (e.g., missing edges). To address these challenges, we propose a new relational model for network data, Sparse Matrix-variate Gaussian process Blockmodel (SMGB). Our model generalizes popular bilinear generative models and captures nonlinear network interactions using a matrix-variate Gaussian process with latent membership variables. We also assign sparse prior distributions on the latent membership variables to learn sparse group assignments for individual network nodes. To estimate the latent variables efficiently from data, we develop an efficient variational expectation maximization method. We compared our approaches with several state-of-the-art network models on both synthetic and real-world network datasets. Experimental results demonstrate SMGBs outperform the alternative approaches in terms of discovering latent classes or predicting unknown interactions. Click to Read Paper
Tensor decomposition is a powerful computational tool for multiway data analysis. Many popular tensor decomposition approaches---such as the Tucker decomposition and CANDECOMP/PARAFAC (CP)---amount to multi-linear factorization. They are insufficient to model (i) complex interactions between data entities, (ii) various data types (e.g. missing data and binary data), and (iii) noisy observations and outliers. To address these issues, we propose tensor-variate latent nonparametric Bayesian models, coupled with efficient inference methods, for multiway data analysis. We name these models InfTucker. Using these InfTucker, we conduct Tucker decomposition in an infinite feature space. Unlike classical tensor decomposition models, our new approaches handle both continuous and binary data in a probabilistic framework. Unlike previous Bayesian models on matrices and tensors, our models are based on latent Gaussian or $t$ processes with nonlinear covariance functions. To efficiently learn the InfTucker from data, we develop a variational inference technique on tensors. Compared with classical implementation, the new technique reduces both time and space complexities by several orders of magnitude. Our experimental results on chemometrics and social network datasets demonstrate that our new models achieved significantly higher prediction accuracy than the most state-of-art tensor decomposition Click to Read Paper
Multi-index fusion has demonstrated impressive performances in retrieval task by integrating different visual representations in a unified framework. However, previous works mainly consider propagating similarities via neighbor structure, ignoring the high order information among different visual representations. In this paper, we propose a new multi-index fusion scheme for image retrieval. By formulating this procedure as a multilinear based optimization problem, the complementary information hidden in different indexes can be explored more thoroughly. Specially, we first build our multiple indexes from various visual representations. Then a so-called index-specific functional matrix, which aims to propagate similarities, is introduced for updating the original index. The functional matrices are then optimized in a unified tensor space to achieve a refinement, such that the relevant images can be pushed more closer. The optimization problem can be efficiently solved by the augmented Lagrangian method with theoretical convergence guarantee. Unlike the traditional multi-index fusion scheme, our approach embeds the multi-index subspace structure into the new indexes with sparse constraint, thus it has little additional memory consumption in online query stage. Experimental evaluation on three benchmark datasets reveals that the proposed approach achieves the state-of-the-art performance, i.e., N-score 3.94 on UKBench, mAP 94.1\% on Holiday and 62.39\% on Market-1501. Click to Read Paper
Discriminative methods often generate hand poses kinematically implausible, then generative methods are used to correct (or verify) these results in a hybrid method. Estimating 3D hand pose in a hierarchy, where the high-dimensional output space is decomposed into smaller ones, has been shown effective. Existing hierarchical methods mainly focus on the decomposition of the output space while the input space remains almost the same along the hierarchy. In this paper, a hybrid hand pose estimation method is proposed by applying the kinematic hierarchy strategy to the input space (as well as the output space) of the discriminative method by a spatial attention mechanism and to the optimization of the generative method by hierarchical Particle Swarm Optimization (PSO). The spatial attention mechanism integrates cascaded and hierarchical regression into a CNN framework by transforming both the input(and feature space) and the output space, which greatly reduces the viewpoint and articulation variations. Between the levels in the hierarchy, the hierarchical PSO forces the kinematic constraints to the results of the CNNs. The experimental results show that our method significantly outperforms four state-of-the-art methods and three baselines on three public benchmarks. Click to Read Paper
In a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner's primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classi cation algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly trade-off prediction accuracy with memory size. The virtual vector machine summarizes the information contained in the preceding data stream by a Gaussian distribution over the classi cation weights plus a constant number of virtual data points. The virtual data points are designed to add extra non-Gaussian information about the classi cation weights. To maintain the constant number of virtual points, the virtual vector machine adds the current real data point into the virtual point set, merges two most similar virtual points into a new virtual point or deletes a virtual point that is far from the decision boundary. The information lost in this process is absorbed into the Gaussian distribution. The extra information provided by the virtual points leads to improved predictive accuracy over previous online classification algorithms. Click to Read Paper
Despite its simplicity, bag-of-n-grams sen- tence representation has been found to excel in some NLP tasks. However, it has not re- ceived much attention in recent years and fur- ther analysis on its properties is necessary. We propose a framework to investigate the amount and type of information captured in a general- purposed bag-of-n-grams sentence represen- tation. We first use sentence reconstruction as a tool to obtain bag-of-n-grams representa- tion that contains general information of the sentence. We then run prediction tasks (sen- tence length, word content, phrase content and word order) using the obtained representation to look into the specific type of information captured in the representation. Our analysis demonstrates that bag-of-n-grams representa- tion does contain sentence structure level in- formation. However, incorporating n-grams with higher order n empirically helps little with encoding more information in general, except for phrase content information. Click to Read Paper
Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency detection by decomposing image features into a low-rank component associated with background and a sparse component associated with visual salient regions. Despite its great potential, existing LRMR-based saliency detection methods seldom consider the inter-relationship among elements within these two components, thus are prone to generating scattered or incomplete saliency maps. In this paper, we introduce a novel and efficient LRMR-based saliency detection model under a coarse-to-fine framework to circumvent this limitation. First, we roughly measure the saliency of image regions with a baseline LRMR model that integrates a $\ell_1$-norm sparsity constraint and a Laplacian regularization smooth term. Given samples from the coarse saliency map, we then learn a projection that maps image features to refined saliency values, to significantly sharpen the object boundaries and to preserve the object entirety. We evaluate our framework against existing LRMR based methods on three benchmark datasets. Experimental results validate the superiority of our method as well as the effectiveness of our suggested coarse-to-fine framework, especially for images containing multiple objects. Click to Read Paper
Gaussian processes (GPs) provide a probabilistic nonparametric representation of functions in regression, classification, and other problems. Unfortunately, exact learning with GPs is intractable for large datasets. A variety of approximate GP methods have been proposed that essentially map the large dataset into a small set of basis points. Among them, two state-of-the-art methods are sparse pseudo-input Gaussian process (SPGP) (Snelson and Ghahramani, 2006) and variablesigma GP (VSGP) Walder et al. (2008), which generalizes SPGP and allows each basis point to have its own length scale. However, VSGP was only derived for regression. In this paper, we propose a new sparse GP framework that uses expectation propagation to directly approximate general GP likelihoods using a sparse and smooth basis. It includes both SPGP and VSGP for regression as special cases. Plus as an EP algorithm, it inherits the ability to process data online. As a particular choice of approximating family, we blur each basis point with a Gaussian distribution that has a full covariance matrix representing the data distribution around that basis point; as a result, we can summarize local data manifold information with a small set of basis points. Our experiments demonstrate that this framework outperforms previous GP classification methods on benchmark datasets in terms of minimizing divergence to the non-sparse GP solution as well as lower misclassification rate. Click to Read Paper
Gaussian processes (GPs) provide a probabilistic nonparametric representation of functions in regression, classification, and other problems. Unfortunately, exact learning with GPs is intractable for large datasets. A variety of approximate GP methods have been proposed that essentially map the large dataset into a small set of basis points. The most advanced of these, the variable-sigma GP (VSGP) (Walder et al., 2008), allows each basis point to have its own length scale. However, VSGP was only derived for regression. We describe how VSGP can be applied to classification and other problems, by deriving it as an expectation propagation algorithm. In this view, sparse GP approximations correspond to a KL-projection of the true posterior onto a compact exponential family of GPs. VSGP constitutes one such family, and we show how to enlarge this family to get additional accuracy. In particular, we show that endowing each basis point with its own full covariance matrix provides a significant increase in approximation power. Click to Read Paper
Currently, most top-performing text detection networks tend to employ fixed-size anchor boxes to guide the search for text instances. They usually rely on a large amount of anchors with different scales to discover texts in scene images, thus leading to high computational cost. In this paper, we propose an end-to-end box-based text detector with scale-adaptive anchors, which can dynamically adjust the scales of anchors according to the sizes of underlying texts by introducing an additional scale regression layer. The proposed scale-adaptive anchors allow us to use a few number of anchors to handle multi-scale texts and therefore significantly improve the computational efficiency. Moreover, compared to discrete scales used in previous methods, the learned continuous scales are more reliable, especially for small texts detection. Additionally, we propose Anchor convolution to better exploit necessary feature information by dynamically adjusting the sizes of receptive fields according to the learned scales. Extensive experiments demonstrate that the proposed detector is fast, taking only $0.28$ second per image, while outperforming most state-of-the-art methods in accuracy. Click to Read Paper
We present the 2017 Hands in the Million Challenge, a public competition designed for the evaluation of the task of 3D hand pose estimation. The goal of this challenge is to assess how far is the state of the art in terms of solving the problem of 3D hand pose estimation as well as detect major failure and strength modes of both systems and evaluation metrics that can help to identify future research directions. The challenge follows up the recent publication of BigHand2.2M and First-Person Hand Action datasets, which have been designed to exhaustively cover multiple hand, viewpoint, hand articulation, and occlusion. The challenge consists of a standardized dataset, an evaluation protocol for two different tasks, and a public competition. In this document we describe the different aspects of the challenge and, jointly with the results of the participants, it will be presented at the 3rd International Workshop on Observing and Understanding Hands in Action, HANDS 2017, with ICCV 2017. Click to Read Paper
Infinite Tucker Decomposition (InfTucker) and random function prior models, as nonparametric Bayesian models on infinite exchangeable arrays, are more powerful models than widely-used multilinear factorization methods including Tucker and PARAFAC decomposition, (partly) due to their capability of modeling nonlinear relationships between array elements. Despite their great predictive performance and sound theoretical foundations, they cannot handle massive data due to a prohibitively high training time. To overcome this limitation, we present Distributed Infinite Tucker (DINTUCKER), a large-scale nonlinear tensor decomposition algorithm on MAPREDUCE. While maintaining the predictive accuracy of InfTucker, it is scalable on massive data. DINTUCKER is based on a new hierarchical Bayesian model that enables local training of InfTucker on subarrays and information integration from all local training results. We use distributed stochastic gradient descent, coupled with variational inference, to train this model. We apply DINTUCKER to multidimensional arrays with billions of elements from applications in the "Read the Web" project (Carlson et al., 2010) and in information security and compare it with the state-of-the-art large-scale tensor decomposition method, GigaTensor. On both datasets, DINTUCKER achieves significantly higher prediction accuracy with less computational time. Click to Read Paper