Research papers and code for "Reshad Hosseini":
We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM by appealing to the rich Riemannian geometry of positive definite matrices, using which we cast Gmm parameter estimation as a Riemannian optimization problem. Surprisingly, such an out-of-the-box Riemannian formulation completely fails and proves much inferior to EM. This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization. We then develop (Riemannian) batch and stochastic gradient algorithms that outperform EM, often substantially. We provide a non-asymptotic convergence analysis for our stochastic method, which is also the first (to our knowledge) such global analysis for Riemannian stochastic gradient. Numerous empirical results are included to demonstrate the effectiveness of our methods.

* 21 pages, 6 figures
Click to Read Paper and Get Code
Mixture models are powerful statistical models used in many applications ranging from density estimation to clustering and classification. When dealing with mixture models, there are many issues that the experimenter should be aware of and needs to solve. The MixEst toolbox is a powerful and user-friendly package for MATLAB that implements several state-of-the-art approaches to address these problems. Additionally, MixEst gives the possibility of using manifold optimization for fitting the density model, a feature specific to this toolbox. MixEst simplifies using and integration of mixture models in statistical models and applications. For developing mixture models of new densities, the user just needs to provide a few functions for that statistical distribution and the toolbox takes care of all the issues regarding mixture models. MixEst is available at visionlab.ut.ac.ir/mixest and is fully documented and is licensed under GPL.

* 5 pages
Click to Read Paper and Get Code
We take a new look at parameter estimation for Gaussian Mixture Models (GMMs). In particular, we propose using \emph{Riemannian manifold optimization} as a powerful counterpart to Expectation Maximization (EM). An out-of-the-box invocation of manifold optimization, however, fails spectacularly: it converges to the same solution but vastly slower. Driven by intuition from manifold convexity, we then propose a reparamerization that has remarkable empirical consequences. It makes manifold optimization not only match EM---a highly encouraging result in itself given the poor record nonlinear programming methods have had against EM so far---but also outperform EM in many practical settings, while displaying much less variability in running times. We further highlight the strengths of manifold optimization by developing a somewhat tuned manifold LBFGS method that proves even more competitive and reliable than existing manifold optimization tools. We hope that our results encourage a wider consideration of manifold optimization for parameter estimation problems.

* 19 pages
Click to Read Paper and Get Code
We present a probabilistic model for natural images which is based on Gaussian scale mixtures and a simple multiscale representation. In contrast to the dominant approach to modeling whole images focusing on Markov random fields, we formulate our model in terms of a directed graphical model. We show that it is able to generate images with interesting higher-order correlations when trained on natural images or samples from an occlusion based model. More importantly, the directed model enables us to perform a principled evaluation. While it is easy to generate visually appealing images, we demonstrate that our model also yields the best performance reported to date when evaluated with respect to the cross-entropy rate, a measure tightly linked to the average log-likelihood.

Click to Read Paper and Get Code
One of the main drawbacks of deep neural networks, like many other classifiers, is their vulnerability to adversarial attacks. An important reason for their vulnerability is assigning high confidence to regions with few or even no feature points. By feature points, we mean a nonlinear transformation of the input space extracting a meaningful representation of the input data. On the other hand, deep-RBF networks assign high confidence only to the regions containing enough feature points, but they have been discounted due to the widely-held belief that they have the vanishing gradient problem. In this paper, we revisit the deep-RBF networks by first giving a general formulation for them, and then proposing a family of cost functions thereof inspired by metric learning. In the proposed deep-RBF learning algorithm, the vanishing gradient problem does not occur. We make these networks robust to adversarial attack by adding the reject option to their output layer. Through several experiments on the MNIST dataset, we demonstrate that our proposed method not only achieves significant classification accuracy but is also very resistant to various adversarial attacks.

Click to Read Paper and Get Code
Pose Graph Optimization (PGO) is an important non-convex optimization problem and is the state-of-the-art formulation for SLAM in robotics. It also has applications like camera motion estimation, structure from motion and 3D reconstruction in machine vision. Recent researches have shown the importance of good initialization to bootstrap well-known iterative PGO solvers to converge to good solutions. The state-of-the-art initialization methods, however, works in low noise or eventually moderate noise problems, and they fail in challenging problems with high measurement noise. Consequently, iterative methods may get entangled in local minima in high noise scenarios. In this paper we present an initialization method which uses orientation measurements and then present a convergence analysis of our iterative algorithm. We show how the algorithm converges to global optima in noise-free cases and also obtain a bound for the difference between our result and the optimum solution in scenarios with noisy measurements. We then present our second algorithm that uses both relative orientation and position measurements to obtain a more accurate least squares approximation of the problem that is again solved iteratively. In the convergence proof, a structural coefficient arises that has important influence on the basin of convergence. Interestingly, simulation results show that this coefficient also affects the performance of other solvers and so it can indicate the complexity of the problem. Experimental results show the excellent performance of the proposed initialization algorithm, specially in high noise scenarios.

Click to Read Paper and Get Code
Due to the lack of enough generalization in the state-space, common methods in Reinforcement Learning (RL) suffer from slow learning speed especially in the early learning trials. This paper introduces a model-based method in discrete state-spaces for increasing learning speed in terms of required experience (but not required computational time) by exploiting generalization in the experiences of the subspaces. A subspace is formed by choosing a subset of features in the original state representation (full-space). Generalization and faster learning in a subspace are due to many-to-one mapping of experiences from the full-space to each state in the subspace. Nevertheless, due to inherent perceptual aliasing in the subspaces, the policy suggested by each subspace does not generally converge to the optimal policy. Our approach, called Model Based Learning with Subspaces (MoBLeS), calculates confidence intervals of the estimated Q-values in the full-space and in the subspaces. These confidence intervals are used in the decision making, such that the agent benefits the most from the possible generalization while avoiding from detriment of the perceptual aliasing in the subspaces. Convergence of MoBLeS to the optimal policy is theoretically investigated. Additionally, we show through several experiments that MoBLeS improves the learning speed in the early trials.

Click to Read Paper and Get Code
We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closed-form solution consistently attains higher classification accuracy.

* 7 pages, 4 figures
Click to Read Paper and Get Code
We study modeling and inference with the Elliptical Gamma Distribution (EGD). We consider maximum likelihood (ML) estimation for EGD scatter matrices, a task for which we develop new fixed-point algorithms. Our algorithms are efficient and converge to global optima despite nonconvexity. Moreover, they turn out to be much faster than both a well-known iterative algorithm of Kent & Tyler (1991) and sophisticated manifold optimization algorithms. Subsequently, we invoke our ML algorithms as subroutines for estimating parameters of a mixture of EGDs. We illustrate our methods by applying them to model natural image statistics---the proposed EGD mixture model yields the most parsimonious model among several competing approaches.

* Computational Statistics & Data Analysis 2016, Vol. 101, 29-43
* 23 pages, 11 figures
Click to Read Paper and Get Code
Offline Signature Verification (OSV) remains a challenging pattern recognition task, especially in the presence of skilled forgeries that are not available during the training. This challenge is aggravated when there are small labeled training data available but with large intra-personal variations. In this study, we address this issue by employing an active learning approach, which selects the most informative instances to label and therefore reduces the human labeling effort significantly. Our proposed OSV includes three steps: feature learning, active learning, and final verification. We benefit from transfer learning using a pre-trained CNN for feature learning. We also propose SVM-based active learning for each user to separate his genuine signatures from the random forgeries. We finally used the SVMs to verify the authenticity of the questioned signature. We examined our proposed active transfer learning method on UTSig: A Persian offline signature dataset. We achieved near 13% improvement compared to the random selection of instances. Our results also showed 1% improvement over the state-of-the-art method in which a fully supervised setting with five more labeled instances per user was used.

Click to Read Paper and Get Code