Neural Networks and the Chomsky Hierarchy
- URL: http://arxiv.org/abs/2207.02098v1
- Date: Tue, 5 Jul 2022 15:06:11 GMT
- Title: Neural Networks and the Chomsky Hierarchy
- Authors: Gr\'egoire Del\'etang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li
Kevin Wenliang, Elliot Catt, Marcus Hutter, Shane Legg, Pedro A. Ortega
- Abstract summary: We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
- Score: 27.470857324448136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable generalization lies at the heart of safe ML and AI. However,
understanding when and how neural networks generalize remains one of the most
important unsolved problems in the field. In this work, we conduct an extensive
empirical study (2200 models, 16 tasks) to investigate whether insights from
the theory of computation can predict the limits of neural network
generalization in practice. We demonstrate that grouping tasks according to the
Chomsky hierarchy allows us to forecast whether certain architectures will be
able to generalize to out-of-distribution inputs. This includes negative
results where even extensive amounts of data and training time never led to any
non-trivial generalization, despite models having sufficient capacity to
perfectly fit the training data. Our results show that, for our subset of
tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can
solve regular and counter-language tasks, and only networks augmented with
structured memory (such as a stack or memory tape) can successfully generalize
on context-free and context-sensitive tasks.
Related papers
- On Logical Extrapolation for Mazes with Recurrent and Implicit Networks [2.0037131645168396]
We show that the capacity for extrapolation is less robust than previously suggested.
We show that while INNs are capable of generalizing to larger maze instances, they fail to generalize along axes of difficulty other than maze size.
arXiv Detail & Related papers (2024-10-03T22:07:51Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Generalization Through the Lens of Learning Dynamics [11.009483845261958]
A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment.
The impressive generalization performance of deep neural networks has stymied theoreticians.
This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
arXiv Detail & Related papers (2022-12-11T00:07:24Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - Generalization in Multimodal Language Learning from Simulation [20.751952728808153]
We investigate the influence of the underlying training data distribution on generalization in a minimal LSTM-based network trained in a supervised, time continuous setting.
We find compositional generalization to fail in simple setups while improving with the number of objects, actions, and particularly with a lot of color overlaps between objects.
arXiv Detail & Related papers (2021-08-03T12:55:18Z) - Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization [40.21297628440919]
We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization.
PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty.
We demonstrate that this task structure provides a rich testbed for understanding generalization.
arXiv Detail & Related papers (2021-07-27T03:50:31Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Exploiting Heterogeneity in Operational Neural Networks by Synaptic
Plasticity [87.32169414230822]
Recently proposed network model, Operational Neural Networks (ONNs), can generalize the conventional Convolutional Neural Networks (CNNs)
In this study the focus is drawn on searching the best-possible operator set(s) for the hidden neurons of the network based on the Synaptic Plasticity paradigm that poses the essential learning theory in biological neurons.
Experimental results over highly challenging problems demonstrate that the elite ONNs even with few neurons and layers can achieve a superior learning performance than GIS-based ONNs.
arXiv Detail & Related papers (2020-08-21T19:03:23Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.