Beyond Transformers for Function Learning
- URL: http://arxiv.org/abs/2304.09979v1
- Date: Wed, 19 Apr 2023 21:33:06 GMT
- Title: Beyond Transformers for Function Learning
- Authors: Simon Segert, Jonathan Cohen
- Abstract summary: The ability to learn and predict simple functions is a key aspect of human intelligence.
Recent works have started to explore this ability using transformer architectures.
We propose to address this gap by augmenting the transformer architecture with two simple inductive learning biases.
- Score: 0.6768558752130311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to learn and predict simple functions is a key aspect of human
intelligence. Recent works have started to explore this ability using
transformer architectures, however it remains unclear whether this is
sufficient to recapitulate the extrapolation abilities of people in this
domain. Here, we propose to address this gap by augmenting the transformer
architecture with two simple inductive learning biases, that are directly
adapted from recent models of abstract reasoning in cognitive science. The
results we report demonstrate that these biases are helpful in the context of
large neural network models, as well as shed light on the types of inductive
learning biases that may contribute to human abilities in extrapolation.
Related papers
- Object-Oriented Transition Modeling with Inductive Logic Programming [4.560623715441945]
We develop a novel learning algorithm that is substantially more powerful than previous methods.<n>Our thorough experiments, including ablation tests and comparison with neural baselines, demonstrate a significant improvement over the state-of-the-art.
arXiv Detail & Related papers (2026-02-07T16:11:53Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - A distributional simplicity bias in the learning dynamics of transformers [50.91742043564049]
We show that transformers, trained on natural language data, also display a simplicity bias.
Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions.
This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.
arXiv Detail & Related papers (2024-10-25T15:39:34Z) - Transcendence: Generative Models Can Outperform The Experts That Train Them [55.885802048647655]
We study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data.
We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset.
arXiv Detail & Related papers (2024-06-17T17:00:52Z) - From Neurons to Neutrons: A Case Study in Interpretability [5.242869847419834]
We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions.
This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it.
arXiv Detail & Related papers (2024-05-27T17:59:35Z) - The Generative AI Paradox: "What It Can Create, It May Not Understand" [81.89252713236746]
Recent wave of generative AI has sparked excitement and concern over potentially superhuman levels of artificial intelligence.
At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans.
This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make?
arXiv Detail & Related papers (2023-10-31T18:07:07Z) - Using Natural Language and Program Abstractions to Instill Human
Inductive Biases in Machines [27.79626958016208]
We show that agents trained by meta-learning may acquire very different strategies from humans.
We show that co-training these agents on predicting representations from natural language task descriptions and from programs induced to generate such tasks guides them toward human-like inductive biases.
arXiv Detail & Related papers (2022-05-23T18:17:58Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - On the Bias Against Inductive Biases [34.10348216388905]
Self-supervised feature learning for visual tasks has seen state-of-the-art success using these extremely deep, isotropic networks.
In this work, we analyze the effect of inductive biases on small to moderately-sized isotropic networks used for unsupervised visual feature learning.
arXiv Detail & Related papers (2021-05-28T19:41:48Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.