Related papers: Learning with invariances in random features and kernel models

Learning with invariances in random features and kernel models

URL: http://arxiv.org/abs/2102.13219v1
Date: Thu, 25 Feb 2021 23:06:21 GMT
Title: Learning with invariances in random features and kernel models
Authors: Song Mei, Theodor Misiakiewicz, Andrea Montanari
Abstract summary: We introduce two classes of models: invariant random features and invariant kernel methods. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as estimators in the dimension. We show that exploiting invariance in the architecture saves a $dalpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures.
Score: 19.78800773518545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A number of machine learning tasks entail a high degree of invariance: the data distribution does not change if we act on the data with a certain group of transformations. For instance, labels of images are invariant under translations of the images. Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties. With the objective of quantifying the gain achieved by invariant architectures, we introduce two classes of models: invariant random features and invariant kernel methods. The latter includes, as a special case, the neural tangent kernel for convolutional networks with global average pooling. We consider uniform covariates distributions on the sphere and hypercube and a general invariant target function. We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as polynomials in the dimension, for a class of groups that we call `degeneracy $\alpha$', with $\alpha \leq 1$. We show that exploiting invariance in the architecture saves a $d^\alpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures. Finally, we show that output symmetrization of an unstructured kernel estimator does not give a significant statistical improvement; on the other hand, data augmentation with an unstructured kernel estimator is equivalent to an invariant kernel estimator and enjoys the same improvement in statistical efficiency.

Related papers

Probabilistic Invariant Learning with Randomized Linear Classifiers [24.485477981244593]
We show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, we propose a class of binary classification models called Randomized Linears (RLCs)
arXiv Detail & Related papers (2023-08-08T17:18:04Z)
Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks. Group-equivariant convolutions are a popular approach to obtain equivariant representations. We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z)
The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities. For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z)
On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions [16.704246627541103]
We show that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy.
arXiv Detail & Related papers (2022-05-26T17:43:20Z)
Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z)
Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types. We show that FA-based models have maximal expressive power in a broad setting. We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z)
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$. We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
tvGP-VAE: Tensor-variate Gaussian Process Prior Variational Autoencoder [0.0]
tvGP-VAE is able to explicitly model correlation via the use of kernel functions. We show that the choice of which correlation structures to explicitly represent in the latent space has a significant impact on model performance.
arXiv Detail & Related papers (2020-06-08T17:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.