Learning with invariances in random features and kernel models
- URL: http://arxiv.org/abs/2102.13219v1
- Date: Thu, 25 Feb 2021 23:06:21 GMT
- Title: Learning with invariances in random features and kernel models
- Authors: Song Mei, Theodor Misiakiewicz, Andrea Montanari
- Abstract summary: We introduce two classes of models: invariant random features and invariant kernel methods.
We characterize the test error of invariant methods in a high-dimensional regime in which the sample size and number of hidden units scale as estimators in the dimension.
We show that exploiting invariance in the architecture saves a $dalpha$ factor ($d$ stands for the dimension) in sample size and number of hidden units to achieve the same test error as for unstructured architectures.
- Score: 19.78800773518545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A number of machine learning tasks entail a high degree of invariance: the
data distribution does not change if we act on the data with a certain group of
transformations. For instance, labels of images are invariant under
translations of the images. Certain neural network architectures -- for
instance, convolutional networks -- are believed to owe their success to the
fact that they exploit such invariance properties. With the objective of
quantifying the gain achieved by invariant architectures, we introduce two
classes of models: invariant random features and invariant kernel methods. The
latter includes, as a special case, the neural tangent kernel for convolutional
networks with global average pooling. We consider uniform covariates
distributions on the sphere and hypercube and a general invariant target
function. We characterize the test error of invariant methods in a
high-dimensional regime in which the sample size and number of hidden units
scale as polynomials in the dimension, for a class of groups that we call
`degeneracy $\alpha$', with $\alpha \leq 1$. We show that exploiting invariance
in the architecture saves a $d^\alpha$ factor ($d$ stands for the dimension) in
sample size and number of hidden units to achieve the same test error as for
unstructured architectures.
Finally, we show that output symmetrization of an unstructured kernel
estimator does not give a significant statistical improvement; on the other
hand, data augmentation with an unstructured kernel estimator is equivalent to
an invariant kernel estimator and enjoys the same improvement in statistical
efficiency.
Related papers
- Probabilistic Invariant Learning with Randomized Linear Classifiers [24.485477981244593]
We show how to leverage randomness and design models that are both expressive and invariant but use less resources.
Inspired by randomized algorithms, we propose a class of binary classification models called Randomized Linears (RLCs)
arXiv Detail & Related papers (2023-08-08T17:18:04Z) - Deep Neural Networks with Efficient Guaranteed Invariances [77.99182201815763]
We address the problem of improving the performance and in particular the sample complexity of deep neural networks.
Group-equivariant convolutions are a popular approach to obtain equivariant representations.
We propose a multi-stream architecture, where each stream is invariant to a different transformation.
arXiv Detail & Related papers (2023-03-02T20:44:45Z) - The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures.
We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities.
For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z) - On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions [16.704246627541103]
We show that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions.
Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy.
arXiv Detail & Related papers (2022-05-26T17:43:20Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types.
We show that FA-based models have maximal expressive power in a broad setting.
We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z) - tvGP-VAE: Tensor-variate Gaussian Process Prior Variational Autoencoder [0.0]
tvGP-VAE is able to explicitly model correlation via the use of kernel functions.
We show that the choice of which correlation structures to explicitly represent in the latent space has a significant impact on model performance.
arXiv Detail & Related papers (2020-06-08T17:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.