Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
- URL: http://arxiv.org/abs/2505.09855v1
- Date: Wed, 14 May 2025 23:31:17 GMT
- Title: Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
- Authors: Alexander Y. Ku, Thomas L. Griffiths, Stephanie C. Y. Chan,
- Abstract summary: Transformer models learn in two distinct modes: in-weights learning (IWL) and in-context learning (ICL)<n>We draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding and phenotypic plasticity.<n>We experimentally operationalize these dimensions of predictability and investigate their influence on the ICL/IWL balance in Transformers.
- Score: 51.992454203752686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models learn in two distinct modes: in-weights learning (IWL), encoding knowledge into model weights, and in-context learning (ICL), adapting flexibly to context without weight modification. To better understand the interplay between these learning modes, we draw inspiration from evolutionary biology's analogous adaptive strategies: genetic encoding (akin to IWL, adapting over generations and fixed within an individual's lifetime) and phenotypic plasticity (akin to ICL, enabling flexible behavioral responses to environmental cues). In evolutionary biology, environmental predictability dictates the balance between these strategies: stability favors genetic encoding, while reliable predictive cues promote phenotypic plasticity. We experimentally operationalize these dimensions of predictability and systematically investigate their influence on the ICL/IWL balance in Transformers. Using regression and classification tasks, we show that high environmental stability decisively favors IWL, as predicted, with a sharp transition at maximal stability. Conversely, high cue reliability enhances ICL efficacy, particularly when stability is low. Furthermore, learning dynamics reveal task-contingent temporal evolution: while a canonical ICL-to-IWL shift occurs in some settings (e.g., classification with many classes), we demonstrate that scenarios with easier IWL (e.g., fewer classes) or slower ICL acquisition (e.g., regression) can exhibit an initial IWL phase later yielding to ICL dominance. These findings support a relative-cost hypothesis for explaining these learning mode transitions, establishing predictability as a critical factor governing adaptive strategies in Transformers, and offering novel insights for understanding ICL and guiding training methodologies.
Related papers
- Provable In-Context Learning of Nonlinear Regression with Transformers [58.018629320233174]
In-context learning (ICL) is the ability to perform unseen tasks using task-specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z) - Adapting to Fragmented and Evolving Data: A Fisher Information Perspective [0.0]
FADE is a lightweight framework for robust learning under dynamic environments.<n>It employs a shift-aware regularization mechanism anchored in Fisher information geometry.<n>FADE operates online with fixed memory and no access to target labels.
arXiv Detail & Related papers (2025-07-25T06:50:09Z) - In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory [54.92893355284945]
Deep learning-based wireless receivers offer the potential to dynamically adapt to varying channel environments.<n>Current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent.<n>This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL)
arXiv Detail & Related papers (2025-06-18T06:43:55Z) - Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers [1.7034813545878589]
Transformer models exhibit remarkable in-context learning (ICL)<n>Our work offers an exact dynamical model for ICL and theoretically grounded tools for analyzing complex transformer training.
arXiv Detail & Related papers (2025-04-17T13:05:33Z) - Strategy Coopetition Explains the Emergence and Transience of In-Context Learning [24.63934469340368]
In-context learning (ICL) is a powerful ability that emerges in transformer models, enabling them to learn from context without weight updates.<n>Recent work has established emergent ICL as a transient phenomenon that can sometimes disappear after long training times.<n>We propose a minimal mathematical model that reproduces these key dynamics and interactions.
arXiv Detail & Related papers (2025-03-07T17:54:05Z) - Contrastive Learning Via Equivariant Representation [19.112460889771423]
We propose CLeVER, a novel equivariant contrastive learning framework compatible with augmentation strategies of arbitrary complexity.
Experimental results demonstrate that CLeVER effectively extracts and incorporates equivariant information from practical natural images.
arXiv Detail & Related papers (2024-06-01T01:53:51Z) - How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? [82.51626700527837]
Transformer-based large language models displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning.
We analyze how the mechanics of how Transformer to achieve ICL contribute to the technical challenges of the training problems in Transformers.
arXiv Detail & Related papers (2024-02-23T21:07:20Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - Incorporating Neuro-Inspired Adaptability for Continual Learning in
Artificial Intelligence [59.11038175596807]
Continual learning aims to empower artificial intelligence with strong adaptability to the real world.
Existing advances mainly focus on preserving memory stability to overcome catastrophic forgetting.
We propose a generic approach that appropriately attenuates old memories in parameter distributions to improve learning plasticity.
arXiv Detail & Related papers (2023-08-29T02:43:58Z) - ArCL: Enhancing Contrastive Learning with Augmentation-Robust
Representations [30.745749133759304]
We develop a theoretical framework to analyze the transferability of self-supervised contrastive learning.
We show that contrastive learning fails to learn domain-invariant features, which limits its transferability.
Based on these theoretical insights, we propose a novel method called Augmentation-robust Contrastive Learning (ArCL)
arXiv Detail & Related papers (2023-03-02T09:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.