Fugu-MT 論文翻訳(概要): ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

論文の概要: ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

arxiv url: http://arxiv.org/abs/2602.02366v1
Date: Mon, 02 Feb 2026 17:24:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:34.324782
Title: ReasonCACHE: Teaching LLMs To Reason Without Weight Updates
Title（参考訳）: ReasonCACHE:軽量アップデートなしでLLMに推論を教える
Authors: Sharut Gupta, Phillip Isola, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja, Mark Ibrahim, Mohammad Pezeshki,
Abstract要約: 大規模言語モデル(LLM)は、コンテキストウィンドウをオーバーロードすることなく、重み付けをすることなく、推論を学習できることを示します。本稿では、デモを固定キー値キャッシュに蒸留するReasonCACHEについて紹介する。経験的に、ReasonCACHEは標準のICLよりも優れており、IWLアプローチにマッチするか、超えている。
参考スコア（独自算出の注目度）: 75.2707292367514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can Large language models (LLMs) learn to reason without any weight update and only through in-context learning (ICL)? ICL is strikingly sample-efficient, often learning from only a handful of demonstrations, but complex reasoning tasks typically demand many training examples to learn from. However, naively scaling ICL by adding more demonstrations breaks down at this scale: attention costs grow quadratically, performance saturates or degrades with longer contexts, and the approach remains a shallow form of learning. Due to these limitations, practitioners predominantly rely on in-weight learning (IWL) to induce reasoning. In this work, we show that by using Prefix Tuning, LLMs can learn to reason without overloading the context window and without any weight updates. We introduce $\textbf{ReasonCACHE}$, an instantiation of this mechanism that distills demonstrations into a fixed key-value cache. Empirically, across challenging reasoning benchmarks, including GPQA-Diamond, ReasonCACHE outperforms standard ICL and matches or surpasses IWL approaches. Further, it achieves this all while being more efficient across three key axes: data, inference cost, and trainable parameters. We also theoretically prove that ReasonCACHE can be strictly more expressive than low-rank weight update since the latter ties expressivity to input rank, whereas ReasonCACHE bypasses this constraint by directly injecting key-values into the attention mechanism. Together, our findings identify ReasonCACHE as a middle path between in-context and in-weight learning, providing a scalable algorithm for learning reasoning skills beyond the context window without modifying parameters. Our project page: https://reasoncache.github.io/
Abstract（参考訳）: 大規模言語モデル(LLM)は、重み付けをせずに、文脈内学習(ICL)を通じてのみ理性を学ぶことができるか? ICLは、サンプル効率が非常に高く、少数のデモから学ぶことが多いが、複雑な推論タスクは通常、学ぶために多くのトレーニング例を必要とする。しかし、より多くのデモを追加することで、ICLを自然にスケールアップすることは、この規模で分解される。これらの制限のため、実践者は推論を誘導するために、主にIWL(In-weight Learning)に依存している。本研究では、プレフィックスチューニングを用いることで、LLMがコンテキストウィンドウをオーバーロードすることなく、重み付けをせずに推論を学習できることを示す。デモを固定キー値キャッシュに蒸留するこのメカニズムのインスタンス化である$\textbf{ReasonCACHE}$を紹介します。 GPQA-Diamondを含む挑戦的な推論ベンチマークにおいて、ReasonCACHEは標準のICLよりも優れており、IWLのアプローチよりも優れている。さらに、データ、推論コスト、トレーニング可能なパラメータの3つの主要な軸にまたがって効率よくこれを実現する。また、ReasonCACHEは入力ランクに表現性を結び付けるため、低ランクの重み更新よりも厳密に表現できることを理論的に証明する一方、ReasonCACHEは注意機構に直接キー値を注入することで、この制約を回避している。そこで本研究では,ReasonCACHEをコンテキスト内学習と重み付き学習の中間経路として認識し,パラメータを変更することなく,コンテキストウィンドウを超えて推論スキルを学習するためのスケーラブルなアルゴリズムを提供する。プロジェクトページ: https://reasoncache.github.io/

論文の概要: ReasonCACHE: Teaching LLMs To Reason Without Weight Updates

関連論文リスト