Fugu-MT 論文翻訳(概要): In-Place Test-Time Training

論文の概要: In-Place Test-Time Training

arxiv url: http://arxiv.org/abs/2604.06169v1
Date: Tue, 07 Apr 2026 17:59:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.991314
Title: In-Place Test-Time Training
Title（参考訳）: 現場テストタイムトレーニング
Authors: Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai,
Abstract要約: In-Place Test-Time Training (In-Place TTT)は、テスト時間トレーニングを備えた大規模言語モデルをシームレスに提供するフレームワークである。 In-Place TTTは、ユビキタスブロックの最終射影行列を適応可能な高速ウェイトとして扱う。我々は,TTTの汎用的な再構築目標を,次世代のToken-Predictionタスクに合わせた,理論的に整った目標に置き換える。
参考スコア（独自算出の注目度）: 32.521599123691026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.
Abstract（参考訳）: 静的な ``train then deploy' パラダイムは、大規模言語モデル(LLM)が、現実のタスクに固有の新しい情報の連続ストリームに応答して、その重みを動的に適応することを、基本的に制限している。テスト時トレーニング(TTT)は、推論時にモデルパラメータのサブセット(高速な重み)を更新することで、魅力的な代替手段を提供するが、現在のLLMエコシステムにおけるそのポテンシャルは、アーキテクチャ上の不適合性、計算非効率性、言語モデリングの高速な重み付けといった重要な障壁によって妨げられている。この記事では、LLMをテスト時トレーニング能力でシームレスに支持するフレームワークであるIn-Place Test-Time Training (In-Place TTT)を紹介します。さらに,TTTの汎用的再構築目的を,自己回帰型言語モデリングを規定するNext-Token-Predictionタスクに明示的に整合した,理論的に整合した目的に置き換える。この原理的な目的と効率的なチャンクワイズ更新機構を組み合わせることで、コンテキスト並列性と互換性のある高度にスケーラブルなアルゴリズムが実現される。内部の強化として、4Bパラメータモデルで最大128kまでのタスクで優れたパフォーマンスを達成でき、スクラッチから事前トレーニングされた場合、競合TTT関連のアプローチよりも一貫して優れています。アブレーション調査の結果は、設計選択についてさらに深い洞察を与えます。本研究の結果は,LLMにおける連続学習のパラダイムに向けた将来的なステップとして,In-Place TTTを確立した。

論文の概要: In-Place Test-Time Training

関連論文リスト