Extrapolatable Relational Reasoning With Comparators in Low-Dimensional
Manifolds
- URL: http://arxiv.org/abs/2006.08698v2
- Date: Sat, 3 Oct 2020 16:00:19 GMT
- Title: Extrapolatable Relational Reasoning With Comparators in Low-Dimensional
Manifolds
- Authors: Duo Wang, Mateja Jamnik, Pietro Lio
- Abstract summary: We propose a neuroscience-inspired inductive-biased module that can be readily amalgamated with current neural network architectures.
We show that neural nets with this inductive bias achieve considerably better o.o.d generalisation performance for a range of relational reasoning tasks.
- Score: 7.769102711230249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While modern deep neural architectures generalise well when test data is
sampled from the same distribution as training data, they fail badly for cases
when the test data distribution differs from the training distribution even
along a few dimensions. This lack of out-of-distribution generalisation is
increasingly manifested when the tasks become more abstract and complex, such
as in relational reasoning. In this paper we propose a neuroscience-inspired
inductive-biased module that can be readily amalgamated with current neural
network architectures to improve out-of-distribution (o.o.d) generalisation
performance on relational reasoning tasks. This module learns to project
high-dimensional object representations to low-dimensional manifolds for more
efficient and generalisable relational comparisons. We show that neural nets
with this inductive bias achieve considerably better o.o.d generalisation
performance for a range of relational reasoning tasks. We finally analyse the
proposed inductive bias module to understand the importance of lower dimension
projection, and propose an augmentation to the algorithmic alignment theory to
better measure algorithmic alignment with generalisation.
Related papers
- Towards Exact Computation of Inductive Bias [8.988109761916379]
We propose a novel method for efficiently computing the inductive bias required for generalization on a task.
We show that higher dimensional tasks require greater inductive bias.
Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures.
arXiv Detail & Related papers (2024-06-22T21:14:24Z) - Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions [2.3020018305241337]
We analyze the behavior of a deep learning system trained on inputs modeled as Gaussian mixtures to better simulate more general structured inputs.
Under certain standardization schemes, the deep learning model converges toward Gaussian setting behavior, even when the input data follow more complex or real-world distributions.
arXiv Detail & Related papers (2024-05-01T17:10:55Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Intrinsic Dimension, Persistent Homology and Generalization in Neural
Networks [19.99615698375829]
We show that generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD)
We develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks.
Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings.
arXiv Detail & Related papers (2021-11-25T17:06:15Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different.
We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel.
We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z) - Linear Regression with Distributed Learning: A Generalization Error
Perspective [0.0]
We investigate the performance of distributed learning for large-scale linear regression.
We focus on the generalization error, i.e., the performance on unseen data.
Our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution.
arXiv Detail & Related papers (2021-01-22T08:43:28Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.