On a Built-in Conflict between Deep Learning and Systematic
Generalization
- URL: http://arxiv.org/abs/2208.11633v1
- Date: Wed, 24 Aug 2022 16:06:36 GMT
- Title: On a Built-in Conflict between Deep Learning and Systematic
Generalization
- Authors: Yuanpeng Li
- Abstract summary: Internal function sharing is one of the reasons to weaken o.o.d. or systematic generalization in deep learning.
We show such phenomena in standard deep learning models, such as fully connected, convolutional, residual networks, LSTMs, and (Vision) Transformers.
- Score: 2.588973722689844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we hypothesize that internal function sharing is one of the
reasons to weaken o.o.d. or systematic generalization in deep learning for
classification tasks. Under equivalent prediction, a model partitions an input
space into multiple parts separated by boundaries. The function sharing prefers
to reuse boundaries, leading to fewer parts for new outputs, which conflicts
with systematic generalization. We show such phenomena in standard deep
learning models, such as fully connected, convolutional, residual networks,
LSTMs, and (Vision) Transformers. We hope this study provides novel insights
into systematic generalization and forms a basis for new research directions.
Related papers
- Generalization Through the Lens of Learning Dynamics [11.009483845261958]
A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment.
The impressive generalization performance of deep neural networks has stymied theoreticians.
This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
arXiv Detail & Related papers (2022-12-11T00:07:24Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Learning Dynamics and Structure of Complex Systems Using Graph Neural
Networks [13.509027957413409]
We trained graph neural networks to fit time series from an example nonlinear dynamical system.
We found simple interpretations of the learned representation and model components.
We successfully identified a graph translator' between the statistical interactions in belief propagation and parameters of the corresponding trained network.
arXiv Detail & Related papers (2022-02-22T15:58:16Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Distinguishing rule- and exemplar-based generalization in learning
systems [10.396761067379195]
We investigate two distinct inductive biases: feature-level bias and exemplar-vs-rule bias.
We find that most standard neural network models have a propensity towards exemplar-based extrapolation.
We discuss the implications of these findings for research on data augmentation, fairness, and systematic generalization.
arXiv Detail & Related papers (2021-10-08T18:37:59Z) - On the Generalization of Models Trained with SGD: Information-Theoretic
Bounds and Implications [13.823089111538128]
We present new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD.
Experimental study based on these bounds provide some insights on the SGD training of neural networks.
arXiv Detail & Related papers (2021-10-07T00:53:33Z) - f-Domain-Adversarial Learning: Theory and Algorithms [82.97698406515667]
Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain.
We derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences.
arXiv Detail & Related papers (2021-06-21T18:21:09Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z) - Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network.
Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation.
We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z) - Self-organizing Democratized Learning: Towards Large-scale Distributed
Learning Systems [71.14339738190202]
democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems.
Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper.
The proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms.
arXiv Detail & Related papers (2020-07-07T08:34:48Z) - Unsupervised Domain Adaptation in Semantic Segmentation: a Review [22.366638308792734]
The aim of this paper is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation.
arXiv Detail & Related papers (2020-05-21T20:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.