Related papers: ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

URL: http://arxiv.org/abs/2602.02366v1
Date: Mon, 02 Feb 2026 17:24:23 GMT
Title: ReasonCACHE: Teaching LLMs To Reason Without Weight Updates
Authors: Sharut Gupta, Phillip Isola, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja, Mark Ibrahim, Mohammad Pezeshki,
Abstract summary: We show that large language models (LLMs) can learn to reason without overloading the context window and without any weight updates.<n>We introduce ReasonCACHE, an instantiation of this mechanism that distills demonstrations into a fixed key-value cache.<n> Empirically, ReasonCACHE outperforms standard ICL and matches or surpasses IWL approaches.
Score: 75.2707292367514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can Large language models (LLMs) learn to reason without any weight update and only through in-context learning (ICL)? ICL is strikingly sample-efficient, often learning from only a handful of demonstrations, but complex reasoning tasks typically demand many training examples to learn from. However, naively scaling ICL by adding more demonstrations breaks down at this scale: attention costs grow quadratically, performance saturates or degrades with longer contexts, and the approach remains a shallow form of learning. Due to these limitations, practitioners predominantly rely on in-weight learning (IWL) to induce reasoning. In this work, we show that by using Prefix Tuning, LLMs can learn to reason without overloading the context window and without any weight updates. We introduce $\textbf{ReasonCACHE}$, an instantiation of this mechanism that distills demonstrations into a fixed key-value cache. Empirically, across challenging reasoning benchmarks, including GPQA-Diamond, ReasonCACHE outperforms standard ICL and matches or surpasses IWL approaches. Further, it achieves this all while being more efficient across three key axes: data, inference cost, and trainable parameters. We also theoretically prove that ReasonCACHE can be strictly more expressive than low-rank weight update since the latter ties expressivity to input rank, whereas ReasonCACHE bypasses this constraint by directly injecting key-values into the attention mechanism. Together, our findings identify ReasonCACHE as a middle path between in-context and in-weight learning, providing a scalable algorithm for learning reasoning skills beyond the context window without modifying parameters. Our project page: https://reasoncache.github.io/

Related papers

Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information [41.10866361182172]
Focused Chain-of-Thought (F-CoT) separates information extraction from the reasoning process.<n>On arithmetic word problems, F-CoT reduces generated tokens by 2-3x while maintaining accuracy comparable to standard zero-shot CoT.
arXiv Detail & Related papers (2025-11-27T07:31:52Z)
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank [71.09032766271493]
Large language models (LLMs) are prone to errors and hallucinations.<n>How to check their outputs effectively and efficiently has become a critical problem in their applications.
arXiv Detail & Related papers (2025-10-28T11:01:10Z)
Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference [7.690958366125321]
This paper introduces informed routing, a new paradigm that proactively addresses these issues.<n>We propose the Lightweight Feature Forecaster (LFF), a small predictive module that estimates a unit's output before routing decisions are made.<n>Experiments on both language modeling and reasoning tasks show that informed routing achieves state-of-the-art efficiency-performance trade-offs.
arXiv Detail & Related papers (2025-10-10T09:59:36Z)
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning [58.62311540316617]
We aim to improve the reasoning capabilities of language models via reinforcement learning (RL)<n>We propose to schedule tasks from easy to hard (E2H), allowing LLMs to build reasoning skills gradually.<n>E2H Reasoner significantly improves the reasoning ability of small LLMs (1.5B to 3B)
arXiv Detail & Related papers (2025-06-07T02:41:54Z)
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster [51.89995713333108]
Chain-of-thought (CoT) distillation allows a large language model (LLM) to guide a small language model (SLM) in reasoning tasks.<n>Existing methods train the SLM to learn the long rationale in one iteration.<n>We propose chunk-wise training (CWT), which uses a search to divide the rationale into internal semantically coherent chunks.
arXiv Detail & Related papers (2025-05-24T11:04:52Z)
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.<n>We propose a novel approach for continuous-space reasoning that does not require modifying the LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z)
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks. We modify specific context tokens, considering the unique structure of input and output formats. Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z)
Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks [25.562937159039038]
In-Context Learning (ICL) in Large Language Models (LLM) has emerged as the dominant technique for performing natural language tasks. We show that ICL relies mostly on the retrieval of task priors and less so on "learning" to perform tasks. We find that, surprisingly, Chain-of-Thought (CoT) indeed suffers from the same posterior collapse as ICL for larger language models.
arXiv Detail & Related papers (2024-09-10T03:06:17Z)
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective [21.361946399192195]
In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance. We propose a simple, model-compression and derivative-free algorithm for downstream tasks in enhancing ICL inference.
arXiv Detail & Related papers (2024-06-06T06:15:35Z)
LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)<n>This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z)
Understanding Emergent In-Context Learning from a Kernel Regression Perspective [55.95455089638838]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.<n>This paper proposes a kernel-regression perspective of understanding LLMs' ICL bahaviors when faced with in-context examples.<n>We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.