On genuine invariance learning without weight-tying
- URL: http://arxiv.org/abs/2308.03904v1
- Date: Mon, 7 Aug 2023 20:41:19 GMT
- Title: On genuine invariance learning without weight-tying
- Authors: Artem Moskalev and Anna Sepliarskaia and Erik J. Bekkers and Arnold
Smeulders
- Abstract summary: We analyze invariance learning in neural networks without weight-tying constraints.
We show that learned invariance is strongly conditioned on the input data, rendering it unreliable if the input distribution shifts.
- Score: 6.308539010172309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate properties and limitations of invariance
learned by neural networks from the data compared to the genuine invariance
achieved through invariant weight-tying. To do so, we adopt a group theoretical
perspective and analyze invariance learning in neural networks without
weight-tying constraints. We demonstrate that even when a network learns to
correctly classify samples on a group orbit, the underlying decision-making in
such a model does not attain genuine invariance. Instead, learned invariance is
strongly conditioned on the input data, rendering it unreliable if the input
distribution shifts. We next demonstrate how to guide invariance learning
toward genuine invariance by regularizing the invariance of a model at the
training. To this end, we propose several metrics to quantify learned
invariance: (i) predictive distribution invariance, (ii) logit invariance, and
(iii) saliency invariance similarity. We show that the invariance learned with
the invariance error regularization closely reassembles the genuine invariance
of weight-tying models and reliably holds even under a severe input
distribution shift. Closer analysis of the learned invariance also reveals the
spectral decay phenomenon, when a network chooses to achieve the invariance to
a specific transformation group by reducing the sensitivity to any input
perturbation.
Related papers
- A Probabilistic Approach to Learning the Degree of Equivariance in Steerable CNNs [5.141137421503899]
Steerable convolutional neural networks (SCNNs) enhance task performance by modelling geometric symmetries.
Yet, unknown or varying symmetries can lead to overconstrained weights and decreased performance.
This paper introduces a probabilistic method to learn the degree of equivariance in SCNNs.
arXiv Detail & Related papers (2024-06-06T10:45:19Z) - What Affects Learned Equivariance in Deep Image Recognition Models? [10.590129221143222]
We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet.
Data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
arXiv Detail & Related papers (2023-04-05T17:54:25Z) - In What Ways Are Deep Neural Networks Invariant and How Should We
Measure This? [5.757836174655293]
We introduce a family of invariance and equivariance metrics that allows us to quantify these properties in a way that disentangles them from other metrics such as loss or accuracy.
We draw a range of conclusions about invariance and equivariance in deep learning models, ranging from whether initializing a model with pretrained weights has an effect on a trained model's invariance, to the extent to which invariance learned via training can generalize to out-of-distribution data.
arXiv Detail & Related papers (2022-10-07T18:43:21Z) - Equivariant Disentangled Transformation for Domain Generalization under
Combination Shift [91.38796390449504]
Combinations of domains and labels are not observed during training but appear in the test environment.
We provide a unique formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement.
arXiv Detail & Related papers (2022-08-03T12:31:31Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible.
We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.