Approximation-Generalization Trade-offs under (Approximate) Group
Equivariance
- URL: http://arxiv.org/abs/2305.17592v1
- Date: Sat, 27 May 2023 22:53:37 GMT
- Title: Approximation-Generalization Trade-offs under (Approximate) Group
Equivariance
- Authors: Mircea Petrache, Shubhendu Trivedi
- Abstract summary: Group equivariant neural networks have demonstrated impressive performance across various domains and applications such as protein and drug design.
We show how models capturing task-specific symmetries lead to improved generalization.
We examine the more general question of model mis-specification when the model symmetries don't align with the data symmetries.
- Score: 3.0458514384586395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The explicit incorporation of task-specific inductive biases through symmetry
has emerged as a general design precept in the development of high-performance
machine learning models. For example, group equivariant neural networks have
demonstrated impressive performance across various domains and applications
such as protein and drug design. A prevalent intuition about such models is
that the integration of relevant symmetry results in enhanced generalization.
Moreover, it is posited that when the data and/or the model may only exhibit
$\textit{approximate}$ or $\textit{partial}$ symmetry, the optimal or
best-performing model is one where the model symmetry aligns with the data
symmetry. In this paper, we conduct a formal unified investigation of these
intuitions. To begin, we present general quantitative bounds that demonstrate
how models capturing task-specific symmetries lead to improved generalization.
In fact, our results do not require the transformations to be finite or even
form a group and can work with partial or approximate equivariance. Utilizing
this quantification, we examine the more general question of model
mis-specification i.e. when the model symmetries don't align with the data
symmetries. We establish, for a given symmetry group, a quantitative comparison
between the approximate/partial equivariance of the model and that of the data
distribution, precisely connecting model equivariance error and data
equivariance error. Our result delineates conditions under which the model
equivariance error is optimal, thereby yielding the best-performing model for
the given task and data.
Related papers
- Equivariant score-based generative models provably learn distributions with symmetries efficiently [7.90752151686317]
Empirical studies have demonstrated that incorporating symmetries into generative models can provide better generalization and sampling efficiency.
We provide the first theoretical analysis and guarantees of score-based generative models (SGMs) for learning distributions that are invariant with respect to some group symmetry.
arXiv Detail & Related papers (2024-10-02T05:14:28Z) - A Generative Model of Symmetry Transformations [44.87295754993983]
We build a generative model that explicitly aims to capture the data's approximate symmetries.
We empirically demonstrate its ability to capture symmetries under affine and color transformations.
arXiv Detail & Related papers (2024-03-04T11:32:18Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Nonparametric Functional Analysis of Generalized Linear Models Under
Nonlinear Constraints [0.0]
This article introduces a novel nonparametric methodology for Generalized Linear Models.
It combines the strengths of the binary regression and latent variable formulations for categorical data.
It extends recently published parametric versions of the methodology and generalizes it.
arXiv Detail & Related papers (2021-10-11T04:49:59Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.