Related papers: Understanding Transformers through the Lens of Pavlovian Conditioning

Understanding Transformers through the Lens of Pavlovian Conditioning

URL: http://arxiv.org/abs/2508.08289v1
Date: Tue, 05 Aug 2025 05:00:00 GMT
Title: Understanding Transformers through the Lens of Pavlovian Conditioning
Authors: Mu Qiao,
Abstract summary: We present a novel theoretical framework that reinterprets the core computation of attention as Pavlovian conditioning.<n>We demonstrate that attention's queries, keys, and values can be mapped to the three elements of classical conditioning.<n>Our framework yields several theoretical insights grounded in this linearized model.
Score: 0.5076419064097734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer architectures have revolutionized artificial intelligence (AI) through their attention mechanisms, yet the computational principles underlying their success remain opaque. We present a novel theoretical framework that reinterprets the core computation of attention as Pavlovian conditioning. Our model finds a direct mathematical analogue in linear attention, which simplifies the analysis of the underlying associative process. We demonstrate that attention's queries, keys, and values can be mapped to the three elements of classical conditioning: test stimuli that probe associations, conditional stimuli (CS) that serve as retrieval cues, and unconditional stimuli (US) that contain response information. Through this lens, we suggest that each attention operation constructs a transient associative memory via a Hebbian rule, where CS-US pairs form dynamic associations that test stimuli can later retrieve. Our framework yields several theoretical insights grounded in this linearized model: (1) a capacity theorem showing that attention heads can store O($\sqrt{d_k}$) associations before interference degrades retrieval; (2) an error propagation analysis revealing fundamental architectural trade-offs of balancing model depth, width, and head redundancy to maintain reliability; and (3) an understanding of how biologically plausible learning rules could enhance transformer architectures. By establishing this deep connection, we suggest that the success of modern AI may stem not from architectural novelty alone, but from implementing computational principles that biology optimized over millions of years of evolution.

Related papers

A Dynamical Theory of Sequential Retrieval in Input-Driven Hopfield Networks [0.0]
This work develops a theory of sequential reasoning in Hopfield networks.<n>We derive explicit conditions for self-sustained memory transitions, including gain thresholds, escape times, and collapse regimes.
arXiv Detail & Related papers (2026-03-03T17:54:36Z)
Quantum LEGO Learning: A Modular Design Principle for Hybrid Artificial Intelligence [63.39968536637762]
We introduce Quantum LEGO Learning, a learning framework that treats classical and quantum components as reusable, composable learning blocks.<n>Within this framework, a pre-trained classical neural network serves as a frozen feature block, while a VQC acts as a trainable adaptive module.<n>We develop a block-wise generalization theory that decomposes learning error into approximation and estimation components.
arXiv Detail & Related papers (2026-01-29T14:29:21Z)
PISA: A Pragmatic Psych-Inspired Unified Memory System for Enhanced AI Agency [50.712873697511206]
Existing work often lacks adaptability to diverse tasks and overlooks the constructive and task-oriented role of AI agent memory.<n>We propose PISA, a pragmatic, psych-inspired unified memory system that treats memory as a constructive and adaptive process.<n>Our empirical evaluation, conducted on the existing LOCOMO benchmark and our newly proposed AggQA benchmark for data analysis tasks, confirms that PISA sets a new state-of-the-art by significantly enhancing adaptability and long-term knowledge retention.
arXiv Detail & Related papers (2025-10-12T10:34:35Z)
Bidirectional Representations Augmented Autoregressive Biological Sequence Generation:Application in De Novo Peptide Sequencing [51.12821379640881]
Autoregressive (AR) models offer holistic, bidirectional representations but face challenges with generative coherence and scalability.<n>We propose a hybrid framework enhancing AR generation by dynamically integrating rich contextual information from non-autoregressive mechanisms.<n>A novel cross-decoder attention module enables the AR decoder to iteratively query and integrate these bidirectional features.
arXiv Detail & Related papers (2025-10-09T12:52:55Z)
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact [27.722167796617114]
This paper offers a cross-disciplinary synthesis of artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems.<n>We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination.<n>We identify key scientific, technical, and ethical challenges on the path to Artificial General Intelligence.
arXiv Detail & Related papers (2025-07-01T16:52:25Z)
Modeling Arbitrarily Applicable Relational Responding with the Non-Axiomatic Reasoning System: A Machine Psychology Approach [0.0]
We present a novel theoretical approach for modeling AARR within an artificial intelligence framework using the Non-Axiomatic Reasoning System (NARS)<n>We show how key properties of AARR can emerge from the inference rules and memory structures of NARS.<n>Results suggest that AARR can be conceptually captured by suitably designed AI systems.
arXiv Detail & Related papers (2025-03-01T20:37:11Z)
Interactive Symbolic Regression through Offline Reinforcement Learning: A Co-Design Framework [11.804368618793273]
Symbolic Regression holds great potential for uncovering underlying mathematical and physical relationships from observed data.<n>Current state-of-the-art approaches typically do not consider the integration of domain experts' prior knowledge.<n>We propose the Symbolic Q-network (Sym-Q), an advanced interactive framework for large-scale symbolic regression.
arXiv Detail & Related papers (2025-02-05T06:26:49Z)
Test-time regression: a unifying framework for designing sequence models with associative memory [24.915262407519876]
We introduce a unifying framework to understand and derive sequence models.<n>We formalize associative recall as a two-step process, memorization and retrieval, casting as a regression problem.<n>Our work bridges sequence modeling with classic regression methods, paving the way for developing more powerful and theoretically principled architectures.
arXiv Detail & Related papers (2025-01-21T18:32:31Z)
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models [52.77133661679439]
Investigating internal reasoning mechanisms of large language models can help us design better model architectures and training strategies. In this study, we constructed a symbolic dataset to investigate the mechanisms by which Transformer models employ vertical thinking strategy. We proposed a random matrix-based algorithm to enhance the model's reasoning ability, resulting in a 75% reduction in the training time required for the GPT-2 model.
arXiv Detail & Related papers (2024-05-24T07:41:26Z)
Closing the Loop: How Semantic Closure Enables Open-Ended Evolution [0.5755004576310334]
This manuscript explores the evolutionary emergence of semantic closure.<n>It integrates concepts from relational biology, physical biosemiotics, and ecological psychology into a unified computational enactivism framework.
arXiv Detail & Related papers (2024-04-05T19:35:38Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.