Curriculum effects and compositionality emerge with in-context learning in neural networks
- URL: http://arxiv.org/abs/2402.08674v3
- Date: Tue, 15 Oct 2024 17:29:13 GMT
- Title: Curriculum effects and compositionality emerge with in-context learning in neural networks
- Authors: Jacob Russin, Ellie Pavlick, Michael J. Frank,
- Abstract summary: We show that networks capable of "in-context learning" (ICL) can reproduce human-like learning and compositional behavior on rule-governed tasks.
Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them.
- Score: 15.744573869783972
- License:
- Abstract: Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are unstructured or randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems -- one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that both metalearning neural networks and large language models are capable of "in-context learning" (ICL) -- the ability to flexibly grasp the structure of a new task from a few examples given at inference time. Here, we show that networks capable of ICL can reproduce human-like learning and compositional behavior on rule-governed tasks, while at the same time replicating human behavioral phenomena in tasks lacking rule-like structure via their usual in-weight learning (IWL). Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them, and that these can coexist with the properties of their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.
Related papers
- From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks [47.13391046553908]
In artificial networks, the effectiveness of these models relies on their ability to build task specific representation.
Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically.
These solutions capture the evolution of representations and the Neural Kernel across the spectrum from the rich to the lazy regimes.
arXiv Detail & Related papers (2024-09-22T23:19:04Z) - From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks [0.0]
We review recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience.
In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities.
We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition.
arXiv Detail & Related papers (2024-05-24T02:36:07Z) - Contrastive-Signal-Dependent Plasticity: Self-Supervised Learning in Spiking Neural Circuits [61.94533459151743]
This work addresses the challenge of designing neurobiologically-motivated schemes for adjusting the synapses of spiking networks.
Our experimental simulations demonstrate a consistent advantage over other biologically-plausible approaches when training recurrent spiking networks.
arXiv Detail & Related papers (2023-03-30T02:40:28Z) - Abrupt and spontaneous strategy switches emerge in simple regularised
neural networks [8.737068885923348]
We study whether insight-like behaviour can occur in simple artificial neural networks.
Analyses of network architectures and learning dynamics revealed that insight-like behaviour crucially depended on a regularised gating mechanism.
This suggests that insight-like behaviour can arise naturally from gradual learning in simple neural networks.
arXiv Detail & Related papers (2023-02-22T12:48:45Z) - Continual Learning, Fast and Slow [75.53144246169346]
According to the Complementary Learning Systems theory, humans do effective emphcontinual learning through two complementary systems.
We propose emphDualNets (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL)
We demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario.
arXiv Detail & Related papers (2022-09-06T10:48:45Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z) - Complementary Structure-Learning Neural Networks for Relational
Reasoning [3.528645587678267]
We show that pattern separation in the hippocampus allows rapid learning in novel environments.
slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments.
arXiv Detail & Related papers (2021-05-19T06:25:21Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z) - Equilibrium Propagation for Complete Directed Neural Networks [0.0]
Most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible.
We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework.
arXiv Detail & Related papers (2020-06-15T22:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.