Boltzmann machine, as a fundamental construction block of deep belief network and deep Boltzmann machines, is widely used in deep learning community and great success has been achieved. However, theoretical understanding of many aspects of it is still far from clear. In this paper, we studied the Rademacher complexity of both the asymptotic restricted Boltzmann machine and the practical implementation with single-step contrastive divergence (CD-1) procedure. Our results disclose the fact that practical implementation training procedure indeed increased the Rademacher complexity of restricted Boltzmann machines. A further research direction might be the investigation of the VC dimension of a compositional function used in the CD-1 procedure.

**Click to Read Paper*** accepted for publication by Neural Networks

**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

Deep Distributed Random Samplings for Supervised Learning: An Alternative to Random Forests?

Jan 28, 2015

Xiao-Lei Zhang

In (\cite{zhang2014nonlinear,zhang2014nonlinear2}), we have viewed machine learning as a coding and dimensionality reduction problem, and further proposed a simple unsupervised dimensionality reduction method, entitled deep distributed random samplings (DDRS). In this paper, we further extend it to supervised learning incrementally. The key idea here is to incorporate label information into the coding process by reformulating that each center in DDRS has multiple output units indicating which class the center belongs to. The supervised learning method seems somewhat similar with random forests (\cite{breiman2001random}), here we emphasize their differences as follows. (i) Each layer of our method considers the relationship between part of the data points in training data with all training data points, while random forests focus on building each decision tree on only part of training data points independently. (ii) Our method builds gradually-narrowed network by sampling less and less data points, while random forests builds gradually-narrowed network by merging subclasses. (iii) Our method is trained more straightforward from bottom layer to top layer, while random forests build each tree from top layer to bottom layer by splitting. (iv) Our method encodes output targets implicitly in sparse codes, while random forests encode output targets by remembering the class attributes of the activated nodes. Therefore, our method is a simpler, more straightforward, and maybe a better alternative choice, though both methods use two very basic elements---randomization and nearest neighbor optimization---as the core. This preprint is used to protect the incremental idea from (\cite{zhang2014nonlinear,zhang2014nonlinear2}). Full empirical evaluation will be announced carefully later.
Jan 28, 2015

Xiao-Lei Zhang

* This paper has been withdrawn by the author. The idea is wrong and is no longer to be posed on site. The paper will no longer be updated

**Click to Read Paper**

Heuristic Ternary Error-Correcting Output Codes Via Weight Optimization and Layered Clustering-Based Approach

Apr 23, 2014

Xiao-Lei Zhang

Apr 23, 2014

Xiao-Lei Zhang

**Click to Read Paper**

Learning Deep Representation Without Parameter Inference for Nonlinear Dimensionality Reduction

Jan 02, 2014

Xiao-Lei Zhang

Unsupervised deep learning is one of the most powerful representation learning techniques. Restricted Boltzman machine, sparse coding, regularized auto-encoders, and convolutional neural networks are pioneering building blocks of deep learning. In this paper, we propose a new building block -- distributed random models. The proposed method is a special full implementation of the product of experts: (i) each expert owns multiple hidden units and different experts have different numbers of hidden units; (ii) the model of each expert is a k-center clustering, whose k-centers are only uniformly sampled examples, and whose output (i.e. the hidden units) is a sparse code that only the similarity values from a few nearest neighbors are reserved. The relationship between the pioneering building blocks, several notable research branches and the proposed method is analyzed. Experimental results show that the proposed deep model can learn better representations than deep belief networks and meanwhile can train a much larger network with much less time than deep belief networks.
Jan 02, 2014

Xiao-Lei Zhang

* This paper has been withdrawn by the author due to a lack of full empirical evaluation

**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

**Click to Read Paper**

* 16 pages, 5 figures, 3 tables

**Click to Read Paper**

Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization

Sep 09, 2015

Yuchen Zhang, Lin Xiao

We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variable. An extrapolation step on the primal variable is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.
Sep 09, 2015

Yuchen Zhang, Lin Xiao

**Click to Read Paper**

Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

Jan 01, 2015

Yuchen Zhang, Lin Xiao

Jan 01, 2015

Yuchen Zhang, Lin Xiao

**Click to Read Paper**

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

Mar 19, 2014

Lin Xiao, Tong Zhang

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multi-stage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method.
Mar 19, 2014

Lin Xiao, Tong Zhang

**Click to Read Paper**

A Proximal-Gradient Homotopy Method for the Sparse Least-Squares Problem

Mar 14, 2012

Lin Xiao, Tong Zhang

Mar 14, 2012

Lin Xiao, Tong Zhang

**Click to Read Paper**

* This paper has been withdrawn by the author due to a lack of full empirical evaluation. More advanced method has been developed. This method has been fully out of date

**Click to Read Paper**

Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective

Mar 08, 2013

Xiao-Lei Zhang, Ji Wu

Mar 08, 2013

Xiao-Lei Zhang, Ji Wu

* This paper has been submitted to the conference "INTERSPEECH2013" in March 4, 2013 for review

**Click to Read Paper**

* This paper has been accepted by IEEE ICASSP-2013, and will be published online after May, 2013

**Click to Read Paper**

* 32 pages, 5 figures, 4tables

**Click to Read Paper**