Related papers: Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

URL: http://arxiv.org/abs/2410.01280v1
Date: Wed, 2 Oct 2024 06:51:12 GMT
Title: Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Authors: Can Demircan, Tankred Saanum, Akshay K. Jagadish, Marcel Binz, Eric Schulz,
Abstract summary: In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs) We first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors.
Score: 7.115323364355489
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs). However, as LLMs' in-context learning abilities continue to improve, understanding this phenomenon mechanistically becomes increasingly important. In particular, it is not well-understood how LLMs learn to solve specific classes of problems, such as reinforcement learning (RL) problems, in-context. Through three different tasks, we first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors. Notably, these representations emerge despite the model only being trained to predict the next token. We verify that these representations are indeed causally involved in the computation of TD errors and $Q$-values by performing carefully designed interventions on them. Taken together, our work establishes a methodology for studying and manipulating in-context learning with SAEs, paving the way for a more mechanistic understanding.

Related papers

Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [52.32193550674408]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z)
How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders [30.36521888592164]
Large Language Models (LLMs) demonstrate remarkable multilingual capabilities and broad knowledge. We analyze how the information encoded in LLMs' internal representations evolves during the training process.
arXiv Detail & Related papers (2025-03-09T02:13:44Z)
Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL) We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping. ICCD emphasizes input-label mapping by contrasting the output distributions between positive and negative in-context examples.
arXiv Detail & Related papers (2025-02-19T14:04:46Z)
The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Language Models are Symbolic Learners in Arithmetic [8.34588487873447]
Large Language Models (LLMs) are thought to struggle with arithmetic learning due to inherent differences between language modeling and numerical computation. We first investigate whether LLMs leverage partial products during arithmetic learning. We find that although LLMs can identify some partial products after learning, they fail to leverage them for arithmetic tasks, conversely.
arXiv Detail & Related papers (2024-10-21T01:57:16Z)
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings. We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z)
Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs [58.09253149867228]
This paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. Motivated by the use of LLMs as a general scientific assistant, we propose textitNTKEval to assess changes in LLM's probability distribution. Our systematic analysis finds evidence of domain understanding during in-context learning. Certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills.
arXiv Detail & Related papers (2024-05-24T12:04:54Z)
Learning to Reduce: Optimal Representations of Structured Data in Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents. We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context. We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Identifying and Analyzing Task-Encoding Tokens in Large Language Models [55.03191279766383]
In this paper, we identify and analyze task-encoding tokens on whose representations the task performance depends. We show that template and stopword tokens are the most prone to be task-encoding. Our work sheds light on how large language models (LLMs) learn to perform a task from demonstrations, deepens our understanding of the varied roles different types of tokens play in LLMs, and provides insights for avoiding instability from improperly utilizing task-encoding tokens.
arXiv Detail & Related papers (2024-01-20T20:55:21Z)
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models [10.414206635385632]
Large Language Models (LLMs) have demonstrated remarkable generalization across diverse tasks. A notable obstacle emerges when feeding numerical/temporal data into these models, such as data sourced from wearables or electronic health records. We discuss recent works that employ LLMs for human-centric tasks such as in mobile health sensing and present a case study showing that popular LLMs tokenize temporal data incorrectly.
arXiv Detail & Related papers (2023-09-12T13:51:29Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations [11.008412414253662]
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs. We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
arXiv Detail & Related papers (2023-06-24T05:02:34Z)
EXnet: Efficient In-context Learning for Data-less Text classification [0.0]
We present EXnet, a model specifically designed to perform in-context learning without limitations on the number of examples. We argue that in-context learning is an effective method to increase task accuracy, and providing examples facilitates cross-task generalization. With extensive experiments, we show that even our smallest model (15M parameters) generalizes to several unseen classification tasks and domains.
arXiv Detail & Related papers (2023-05-24T01:40:57Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.