Models, code, and papers for "Stéphane Mallat":
Deep convolutional networks provide state of the art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of linear filter weights and non-linearities. A mathematical framework is introduced to analyze their properties. Computations of invariants involve multiscale contractions, the linearization of hierarchical symmetries, and sparse separations. Applications are discussed.
This paper constructs translation invariant operators on L2(R^d), which are Lipschitz continuous to the action of diffeomorphisms. A scattering propagator is a path ordered product of non-linear and non-commuting operators, each of which computes the modulus of a wavelet transform. A local integration defines a windowed scattering transform, which is proved to be Lipschitz continuous to the action of diffeomorphisms. As the window size increases, it converges to a wavelet scattering transform which is translation invariant. Scattering coefficients also provide representations of stationary processes. Expected values depend upon high order moments and can discriminate processes having the same power spectrum. Scattering operators are extended on L2 (G), where G is a compact Lie group, and are invariant under the action of G. Combining a scattering on L2(R^d) and on Ld (SO(d)) defines a translation and rotation invariant scattering on L2(R^d).
We define maximum entropy models of non-Gaussian stationary random vectors from covariances of non-linear representations. These representations are calculated by multiplying the phase of Fourier or wavelet coefficients with harmonic integers, which amounts to compute a windowed Fourier transform along their phase. Rectifiers in neural networks compute such phase windowing. The covariance of these harmonic coefficients capture dependencies of Fourier and wavelet coefficients across frequencies, by canceling their random phase. We introduce maximum entropy models conditioned by such covariances over a graph of local interactions. These models are approximated by transporting an initial maximum entropy measure with a gradient descent. The precision of wavelet phase harmonic models is numerically evaluated over turbulent flows and other non-Gaussian stationary processes.
Generative Adversarial Nets (GANs) and Variational Auto-Encoders (VAEs) provide impressive image generations from Gaussian white noise, but the underlying mathematics are not well understood. We compute deep convolutional network generators by inverting a fixed embedding operator. Therefore, they do not require to be optimized with a discriminator or an encoder. The embedding is Lipschitz continuous to deformations so that generators transform linear interpolations between input white noise vectors into deformations between output images. This embedding is computed with a wavelet Scattering transform. Numerical experiments demonstrate that the resulting Scattering generators have similar properties as GANs or VAEs, without learning a discriminative network or an encoder.
We present a new representation of harmonic sounds that linearizes the dynamics of pitch and spectral envelope, while remaining stable to deformations in the time-frequency plane. It is an instance of the scattering transform, a generic operator which cascades wavelet convolutions and modulus nonlinearities. It is derived from the pitch spiral, in that convolutions are successively performed in time, log-frequency, and octave index. We give a closed-form approximation of spiral scattering coefficients for a nonstationary generalization of the harmonic source-filter model.
We introduce general scattering transforms as mathematical models of deep neural networks with l2 pooling. Scattering networks iteratively apply complex valued unitary operators, and the pooling is performed by a complex modulus. An expected scattering defines a contractive representation of a high-dimensional probability distribution, which preserves its mean-square norm. We show that unsupervised learning can be casted as an optimization of the space contraction to preserve the volume occupied by unlabeled examples, at each layer of the network. Supervised learning and classification are performed with an averaged scattering, which provides scattering estimations for multiple classes.
Dictionary learning algorithms or supervised deep convolution networks have considerably improved the efficiency of predefined feature representations such as SIFT. We introduce a deep scattering convolution network, with predefined wavelet filters over spatial and angular variables. This representation brings an important improvement to results previously obtained with predefined features over object image databases such as Caltech and CIFAR. The resulting accuracy is comparable to results obtained with unsupervised deep learning and dictionary based representations. This shows that refining image representations by using geometric priors is a promising direction to improve image classification and its understanding.
A rigid-motion scattering computes adaptive invariants along translations and rotations, with a deep convolutional network. Convolutions are calculated on the rigid-motion group, with wavelets defined on the translation and rotation variables. It preserves joint rotation and translation information, while providing global invariants at any desired scale. Texture classification is studied, through the characterization of stationary processes from a single realization. State-of-the-art results are obtained on multiple texture data bases, with important rotation and scaling variabilities.
A scattering vector is a local descriptor including multiscale and multi-direction co-occurrence information. It is computed with a cascade of wavelet decompositions and complex modulus. This scattering representation is locally translation invariant and linearizes deformations. A supervised classification algorithm is computed with a PCA model selection on scattering vectors. State of the art results are obtained for handwritten digit recognition and texture classification.
A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with non-linear modulus and averaging operators. The first network layer outputs SIFT-type descriptors whereas the next layers provide complementary invariant information which improves classification. The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification. A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having the same Fourier power spectrum. State of the art classification results are obtained for handwritten digits and texture discrimination, using a Gaussian kernel SVM and a generative PCA classifier.
A scattering transform defines a signal representation which is invariant to translations and Lipschitz continuous relatively to deformations. It is implemented with a non-linear convolution network that iterates over wavelet and modulus operators. Lipschitz continuity locally linearizes deformations. Complex classes of signals and textures can be modeled with low-dimensional affine spaces, computed with a PCA in the scattering domain. Classification is performed with a penalized model selection. State of the art results are obtained for handwritten digit recognition over small training sets, and for texture classification.
A geometric model of sparse signal representations is introduced for classes of signals. It is computed by optimizing co-occurrence groups with a maximum likelihood estimate calculated with a Bernoulli mixture model. Applications to face image compression and MNIST digit classification illustrate the applicability of this model.
We prove that linear rectifiers act as phase transformations on complex analytic extensions of convolutional network coefficients. These phase transformations are linearized over a set of phase harmonics, computed with a Fourier transform. The correlation matrix of one-layer convolutional network coefficients is a translation invariant representation, which is used to build statistical models of stationary processes. We prove that it is Lipschitz continuous and that it has a sparse representation over phase harmonics. When network filters are wavelets, phase harmonic correlations provide important information about phase alignments across scales. We demonstrate numerically that large classes of one-dimensional signals and images are precisely reconstructed with a small fraction of phase harmonic correlations.
We introduce multiscale invariant dictionaries to estimate quantum chemical energies of organic molecules, from training databases. Molecular energies are invariant to isometric atomic displacements, and are Lipschitz continuous to molecular deformations. Similarly to density functional theory (DFT), the molecule is represented by an electronic density function. A multiscale invariant dictionary is calculated with wavelet scattering invariants. It cascades a first wavelet transform which separates scales, with a second wavelet transform which computes interactions across scales. Sparse scattering regressions give state of the art results over two databases of organic planar molecules. On these databases, the regression error is of the order of the error produced by DFT codes, but at a fraction of the computational cost.
We present a novel approach to the regression of quantum mechanical energies based on a scattering transform of an intermediate electron density representation. A scattering transform is a deep convolution network computed with a cascade of multiscale wavelet transforms. It possesses appropriate invariant and stability properties for quantum energy regression. This new framework removes fundamental limitations of Coulomb matrix based energy regressions, and numerical experiments give state-of-the-art accuracy over planar molecules.
The classification of high-dimensional data defined on graphs is particularly difficult when the graph geometry is unknown. We introduce a Haar scattering transform on graphs, which computes invariant signal descriptors. It is implemented with a deep cascade of additions, subtractions and absolute values, which iteratively compute orthogonal Haar wavelet transforms. Multiscale neighborhoods of unknown graphs are estimated by minimizing an average total variation, with a pair matching algorithm of polynomial complexity. Supervised classification with dimension reduction is tested on data bases of scrambled images, and for signals sampled on unknown irregular grids on a sphere.
We introduce a two-layer wavelet scattering network, for object classification. This scattering transform computes a spatial wavelet transform on the first layer and a new joint wavelet transform along spatial, angular and scale variables in the second layer. Numerical experiments demonstrate that this two layer convolution network, which involves no learning and no max pooling, performs efficiently on complex image data sets such as CalTech, with structural objects variability and clutter. It opens the possibility to simplify deep neural network learning by initializing the first layers with wavelet filters.
A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initialization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or very small margin worse results than the best published ones, at a lower computational cost.
We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet ILSVRC2012 dataset. The network first applies a scattering transform which linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse l1 dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework including ALISTA. Classification results are analyzed over ImageNet.
Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict the marks of new point patterns. To approximate (interpolate) the marking function, in our baseline approach, we build a statistical regression model of the marks with respect some local point distance representation. In a more advanced approach, we use a global data representation via the scattering moments of random measures, which build informative and stable to deformations data representation, already proven useful in image analysis and related application domains. In this case, the regression of the scattering moments of the marked point patterns with respect to the non-marked ones is combined with the numerical solution of the inverse problem, where the marks are recovered from the estimated scattering moments. Considering some simple, generic marks, often appearing in the modeling of wireless networks, such as the shot-noise values, nearest neighbour distance, and some characteristics of the Voronoi cells, we show that the scattering moments can capture similar geometry information as the baseline approach, and can reach even better performance, especially for non-local marking functions. Our results motivate further development of statistical learning tools for stochastic geometry and analysis of wireless networks, in particular to predict cell loads in cellular networks from the locations of base stations and traffic demand.