Related papers: Curriculum effects and compositionality emerge with in-context learning in neural networks

Curriculum effects and compositionality emerge with in-context learning in neural networks

URL: http://arxiv.org/abs/2402.08674v3
Date: Tue, 15 Oct 2024 17:29:13 GMT
Title: Curriculum effects and compositionality emerge with in-context learning in neural networks
Authors: Jacob Russin, Ellie Pavlick, Michael J. Frank,
Abstract summary: We show that networks capable of "in-context learning" (ICL) can reproduce human-like learning and compositional behavior on rule-governed tasks. Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them.
Score: 15.744573869783972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are unstructured or randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems -- one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that both metalearning neural networks and large language models are capable of "in-context learning" (ICL) -- the ability to flexibly grasp the structure of a new task from a few examples given at inference time. Here, we show that networks capable of ICL can reproduce human-like learning and compositional behavior on rule-governed tasks, while at the same time replicating human behavioral phenomena in tasks lacking rule-like structure via their usual in-weight learning (IWL). Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties than those traditionally attributed to them, and that these can coexist with the properties of their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.

Related papers

The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Evolution imposes an inductive bias that alters and accelerates learning dynamics [49.1574468325115]
We investigate the effect of evolutionary optimization on the learning dynamics of neural networks.<n>We combined algorithms natural selection and online learning to produce a method for evolutionarily conditioning artificial neural networks.<n>Results suggest evolution constitutes an inductive bias that tunes neural systems to enable rapid learning.
arXiv Detail & Related papers (2025-05-15T18:50:57Z)
Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry [7.517013801971377]
We introduce an analysis framework based on representational geometry to study feature learning. We find that when a network learns features useful for solving a task, the task-relevant manifold become increasingly untangled. By tracking changes in the underlying manifold geometry, we uncover distinct learning stages throughout training.
arXiv Detail & Related papers (2025-03-23T15:39:56Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Network Dynamics-Based Framework for Understanding Deep Neural Networks [11.44947569206928]
We propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory.<n>We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level.<n>Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks [47.13391046553908]
In artificial networks, the effectiveness of these models relies on their ability to build task specific representation. Prior studies highlight that different initializations can place networks in either a lazy regime, where representations remain static, or a rich/feature learning regime, where representations evolve dynamically. These solutions capture the evolution of representations and the Neural Kernel across the spectrum from the rich to the lazy regimes.
arXiv Detail & Related papers (2024-09-22T23:19:04Z)
Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling [52.06722364186432]
We propose a biologically-informed framework for enhancing artificial neural networks (ANNs) Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors. We outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, bioinspiration and complexity.
arXiv Detail & Related papers (2024-07-05T14:11:28Z)
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks [0.0]
We review recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition.
arXiv Detail & Related papers (2024-05-24T02:36:07Z)
Learning the Plasticity: Plasticity-Driven Learning Framework in Spiking Neural Networks [9.25919593660244]
New paradigm for Spiking Neural Networks (SNNs) Plasticity-Driven Learning Framework (PDLF) PDLF redefines concepts of functional and Presynaptic-Dependent Plasticity.
arXiv Detail & Related papers (2023-08-23T11:11:31Z)
Contrastive-Signal-Dependent Plasticity: Self-Supervised Learning in Spiking Neural Circuits [61.94533459151743]
This work addresses the challenge of designing neurobiologically-motivated schemes for adjusting the synapses of spiking networks. Our experimental simulations demonstrate a consistent advantage over other biologically-plausible approaches when training recurrent spiking networks.
arXiv Detail & Related papers (2023-03-30T02:40:28Z)
Abrupt and spontaneous strategy switches emerge in simple regularised neural networks [8.737068885923348]
We study whether insight-like behaviour can occur in simple artificial neural networks. Analyses of network architectures and learning dynamics revealed that insight-like behaviour crucially depended on a regularised gating mechanism. This suggests that insight-like behaviour can arise naturally from gradual learning in simple neural networks.
arXiv Detail & Related papers (2023-02-22T12:48:45Z)
Continual Learning, Fast and Slow [75.53144246169346]
According to the Complementary Learning Systems theory, humans do effective emphcontinual learning through two complementary systems. We propose emphDualNets (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL) We demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario.
arXiv Detail & Related papers (2022-09-06T10:48:45Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Compositional Processing Emerges in Neural Networks Solving Math Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed. Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)
Complementary Structure-Learning Neural Networks for Relational Reasoning [3.528645587678267]
We show that pattern separation in the hippocampus allows rapid learning in novel environments. slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments.
arXiv Detail & Related papers (2021-05-19T06:25:21Z)
Learning Contact Dynamics using Physically Structured Neural Networks [81.73947303886753]
We use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations. Our results indicate that an idealised form of touch feedback is a key component of making this learning problem tractable.
arXiv Detail & Related papers (2021-02-22T17:33:51Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
Equilibrium Propagation for Complete Directed Neural Networks [0.0]
Most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework.
arXiv Detail & Related papers (2020-06-15T22:12:30Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.