Fugu-MT 論文翻訳(概要): MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

論文の概要: MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

arxiv url: http://arxiv.org/abs/2602.03359v1
Date: Tue, 03 Feb 2026 10:32:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.393411
Title: MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling
Title（参考訳）: MeKi: 効率的なLLMスケーリングのためのメモリベースのエキスパート知識注入
Authors: Ning Ding, Fangcheng Liu, Kyungrae Kim, Linji Hao, Kyeng-Hun Lee, Hyeonmok Ko, Yehui Tang,
Abstract要約: 大規模言語モデル(LLM)のスケーリングは通常、パフォーマンスを高めるためにパラメータやテスト時間計算の数を増やすことに依存する。 MeKi(メモリベースエキスパートナレッジインジェクション)は、FLOPではなくストレージ空間を介してLLM容量をスケールする新しいシステムである。 MeKiは、同一の推論速度で高密度LLMベースラインを著しく上回る。
参考スコア（独自算出の注目度）: 29.784396745475835
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Scaling Large Language Models (LLMs) typically relies on increasing the number of parameters or test-time computations to boost performance. However, these strategies are impractical for edge device deployment due to limited RAM and NPU resources. Despite hardware constraints, deploying performant LLM on edge devices such as smartphone remains crucial for user experience. To address this, we propose MeKi (Memory-based Expert Knowledge Injection), a novel system that scales LLM capacity via storage space rather than FLOPs. MeKi equips each Transformer layer with token-level memory experts that injects pre-stored semantic knowledge into the generation process. To bridge the gap between training capacity and inference efficiency, we employ a re-parameterization strategy to fold parameter matrices used during training into a compact static lookup table. By offloading the knowledge to ROM, MeKi decouples model capacity from computational cost, introducing zero inference latency overhead. Extensive experiments demonstrate that MeKi significantly outperforms dense LLM baselines with identical inference speed, validating the effectiveness of memory-based scaling paradigm for on-device LLMs. Project homepage is at https://github.com/ningding-o/MeKi.
Abstract（参考訳）: 大規模言語モデル(LLM)のスケーリングは通常、パフォーマンスを高めるためにパラメータやテスト時間計算の数を増やすことに依存する。しかしながら、これらの戦略は、限られたRAMとNPUリソースのため、エッジデバイスのデプロイには実用的ではない。ハードウェアの制約にもかかわらず、スマートフォンなどのエッジデバイスにパフォーマンスのLLMをデプロイすることは、ユーザエクスペリエンスにとって不可欠である。そこで本研究では,FLOPではなくストレージ空間を介してLLM容量を拡大する新しいシステムであるMeKi(メモリベースエキスパート知識注入)を提案する。 MeKiは各トランスフォーマー層にトークンレベルのメモリエキスパートを装備し、プリストアドセマンティック知識を生成プロセスに注入する。トレーニング能力と推論効率のギャップを埋めるために、トレーニング中に使用するパラメータ行列をコンパクトな静的ルックアップテーブルに折り畳むために再パラメータ化戦略を用いる。知識をROMにオフロードすることで、MeKiはモデル容量を計算コストから切り離し、推論遅延のオーバーヘッドをゼロにする。大規模な実験により、MeKiは同一の推論速度で高密度LCMベースラインを著しく上回り、オンデバイスLSMにおけるメモリベースのスケーリングパラダイムの有効性を検証した。プロジェクトのホームページはhttps://github.com/ningding-o/MeKi.comにある。

論文の概要: MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

関連論文リスト