Fugu-MT 論文翻訳(概要): Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

論文の概要: Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

arxiv url: http://arxiv.org/abs/2605.29075v1
Date: Wed, 27 May 2026 20:29:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:55.447215
Title: Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules
Title（参考訳）: 知識のオフロード: LLMをスパースバックボーンとメモリモジュールに分解する
Authors: Karim Galliamov, Rochelle Choenni, Ivan Titov,
Abstract要約: 本稿では,事前学習したLDMを,疎結合のバックボーンとドメイン固有の記憶に組み込むフレームワークを提案する。 3Bから8BまでのLlamaモデルとQwenモデル全体で、モデル能力に大きな損失を伴わずに、非自明なキャパシティを共有バックボーンから移動できることが判明した。
参考スコア（独自算出の注目度）: 15.262187270149582
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: LLMs encode both general capabilities and domain-specific knowledge in a single set of parameters. We ask whether this capacity can be reorganized: keeping broadly useful computation in a shared backbone, while moving specialized knowledge into external memory modules. We propose \emph{knowledge offloading} (KOFF), a framework for decomposing a pretrained LLM into a sparse shared backbone and domain-specific memories. Starting from a frozen base model, we jointly learn a structured pruning mask and lightweight recovery modules, implemented as LoRA adapters and learned key-value caches. Across Llama and Qwen models from 3B to 8B, we find that non-trivial capacity can be moved out of the shared backbone without a large loss in model ability. At around 12\% global sparsity, KOFF preserves much of the unpruned model's performance, while pruning the same frozen model without memories degrades sharply. Ablations show that LoRA and learned KV memories are complementary, and specialization analyses suggest that the learned decomposition is meaningful: language-specific neurons are preferentially removed while language-general neurons largely remain in the backbone. These results suggest that knowledge can be reallocated between a shared core and swappable external memories.
Abstract（参考訳）: LLMは一般的な能力とドメイン固有の知識を一つのパラメータ集合にエンコードする。専門知識を外部メモリモジュールに移行しながら、共有バックボーンに広く有用な計算を保ちながら、この能力を再編成できるかどうかを問う。本稿では,事前学習したLDMを疎共有のバックボーンとドメイン固有の記憶に分解するフレームワークである「emph{knowledge offloading} (KOFF)」を提案する。凍結ベースモデルから始めて、構造化プルーニングマスクと軽量リカバリモジュールを共同で学習し、LoRAアダプタとして実装し、キー値キャッシュを学習する。 3Bから8BまでのLlamaモデルとQwenモデル全体で、モデル能力に大きな損失を伴わずに、非自明なキャパシティを共有バックボーンから移動できることが判明した。世界の約12倍の間隔で、KOFFは未切断モデルの性能の多くを保ちながら、同じ凍結モデルでメモリなしでプルーニングすると急激に劣化する。アブレーションは、LoRAと学習したKV記憶は相補的であり、特殊化分析は、学習された分解が有意義であることを示唆している:言語特異的ニューロンは優先的に除去され、言語一般ニューロンは概ね背骨に残る。これらの結果は,共有コアと交換可能な外部記憶の間で知識が再配置可能であることを示唆している。

論文の概要: Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

関連論文リスト