Fugu-MT 論文翻訳(概要): Doc-to-LoRA: Learning to Instantly Internalize Contexts

論文の概要: Doc-to-LoRA: Learning to Instantly Internalize Contexts

arxiv url: http://arxiv.org/abs/2602.15902v1
Date: Fri, 13 Feb 2026 06:54:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-19 15:58:30.379916
Title: Doc-to-LoRA: Learning to Instantly Internalize Contexts
Title（参考訳）: Doc-to-LoRA: コンテキストを瞬時に内部化する学習
Authors: Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, Robert Tjarko Lange,
Abstract要約: Doc-to-LoRA (D2L) は、メタ学習者が1つのフォワードパス内で近似CDを実行するための軽量なハイパーネットワークである。目に見えないプロンプトが与えられた場合、D2L はターゲット LLM 用の LoRA アダプタを生成し、元のコンテキストを再消費することなく、その後のクエリに答えられるようにする。計算能力に制限のある実世界のQAデータセットでは、D2Lは標準CDよりも優れ、ピークメモリ消費と更新レイテンシを著しく低減する。
参考スコア（独自算出の注目度）: 21.3099441873994
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers makes inference memory-intensive and slow. While context distillation (CD) can transfer information into model parameters, per-prompt distillation is impractical due to training costs and latency. To address these limitations, we propose Doc-to-LoRA (D2L), a lightweight hypernetwork that meta-learns to perform approximate CD within a single forward pass. Given an unseen prompt, D2L generates a LoRA adapter for a target LLM, enabling subsequent queries to be answered without re-consuming the original context, reducing latency and KV-cache memory consumption during inference of the target LLM. On a long-context needle-in-a-haystack task, D2L successfully learns to map contexts into adapters that store the needle information, achieving near-perfect zero-shot accuracy at sequence lengths exceeding the target LLM's native context window by more than 4x. On real-world QA datasets with limited compute, D2L outperforms standard CD while significantly reducing peak memory consumption and update latency. We envision that D2L can facilitate rapid adaptation of LLMs, opening up the possibility of frequent knowledge updates and personalized chat behavior.
Abstract（参考訳）: 長い入力シーケンスは、文脈内学習、文書理解、および大規模言語モデル(LLM)の多段階推論の中心である。しかし、トランスフォーマーの二次的注意コストは、推論をメモリ集約的で遅くする。文脈蒸留(CD)は、情報をモデルパラメータに転送することができるが、訓練コストと遅延のため、プロンプト当たりの蒸留は実行不可能である。これらの制約に対処するために,メタ学習を行う軽量ハイパーネットワークDoc-to-LoRA(D2L)を提案する。未確認のプロンプトが与えられた後、D2LはターゲットLLM用のLoRAアダプタを生成し、元のコンテキストを再消費することなく、その後のクエリに応答できるようにし、ターゲットLLMの推論時にレイテンシとKVキャッシュメモリ消費を低減する。長文のニードル・イン・ア・ヘイスタックタスクにおいて、D2Lは、コンテキストを針情報を格納するアダプタにマッピングすることを学び、ターゲットLLMのネイティブコンテキストウィンドウを4倍以上のシーケンス長でほぼ完璧なゼロショット精度を達成する。計算能力に制限のある実世界のQAデータセットでは、D2Lは標準CDよりも優れ、ピークメモリ消費と更新レイテンシを著しく低減する。我々は、D2LがLLMの迅速な適応を容易にし、頻繁な知識更新とパーソナライズされたチャット行動の可能性を広げることを期待している。

論文の概要: Doc-to-LoRA: Learning to Instantly Internalize Contexts

関連論文リスト