Fugu-MT 論文翻訳(概要): Lizard: An Efficient Linearization Framework for Large Language Models

論文の概要: Lizard: An Efficient Linearization Framework for Large Language Models

arxiv url: http://arxiv.org/abs/2507.09025v1
Date: Fri, 11 Jul 2025 21:19:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-15 18:48:22.157236
Title: Lizard: An Efficient Linearization Framework for Large Language Models
Title（参考訳）: Lizard: 大規模言語モデルのための効率的な線形化フレームワーク
Authors: Chien Van Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen,
Abstract要約: 我々は,事前学習したトランスフォーマーベース大規模言語モデル(LLM)を,無限コンテキスト生成のための柔軟性のあるサブクワッドアーキテクチャに変換する線形化フレームワークであるLizardを提案する。 Lizardは、出力品質を保ちながらソフトマックスアテンションを正確に近似するサブクワッドアテンションメカニズムを導入することで、この制限に対処する。そこで本研究では,Lizardが従来の線形化手法を著しく上回りながら,標準言語モデリングタスクにおける教師モデルの性能のほぼ無作為な回復を実現していることを示す。
参考スコア（独自算出の注目度）: 100.63879229649581
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into flexible, subquadratic architectures for infinite-context generation. Transformer-based LLMs face significant memory and computational bottlenecks as context lengths increase, due to the quadratic complexity of softmax attention and the growing key-value (KV) cache. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving the output quality. Unlike previous linearization methods, which are often limited by fixed model structures and therefore exclude gating mechanisms, Lizard incorporates a gating module inspired by recent state-of-the-art linear models. This enables adaptive memory control, supports constant-memory inference, offers strong length generalization, and allows more flexible model design. Lizard combines gated linear attention for global context compression with sliding window attention enhanced by meta memory, forming a hybrid mechanism that captures both long-range dependencies and fine-grained local interactions. Moreover, we introduce a hardware-aware algorithm that accelerates the training speed of our models. Extensive experiments show that Lizard achieves near-lossless recovery of the teacher model's performance across standard language modeling tasks, while significantly outperforming previous linearization methods. On the 5-shot MMLU benchmark, Lizard improves over prior models by 18 points and shows significant improvements on associative recall tasks.
Abstract（参考訳）: 我々は,事前学習したトランスフォーマーベース大規模言語モデル(LLM)を,無限コンテキスト生成のための柔軟性のあるサブクワッドアーキテクチャに変換する線形化フレームワークであるLizardを提案する。トランスフォーマーベースのLLMは、ソフトマックスアテンションの2次複雑さとキー値(KV)キャッシュの増大により、コンテキスト長の増加に伴い、メモリと計算のボトルネックに直面する。 Lizardは、出力品質を保ちながらソフトマックスアテンションを密に近似する準4次アテンション機構を導入することで、これらの制限に対処する。固定されたモデル構造によって制限され、従ってゲーティング機構を除外する従来の線形化法とは異なり、リザードは最近の最先端の線形モデルにインスパイアされたゲーティングモジュールを組み込んでいる。これにより、適応的なメモリ制御が可能となり、定数メモリ推論をサポートし、強力な長さの一般化を提供し、より柔軟なモデル設計が可能になる。 Lizardは、グローバルなコンテキスト圧縮のためのゲート付き線形アテンションとメタメモリによって強化されたスライディングウインドウアテンションを組み合わせることで、長距離依存関係ときめ細かいローカルインタラクションの両方をキャプチャするハイブリッドメカニズムを形成する。さらに,モデルのトレーニング速度を高速化するハードウェア・アウェア・アルゴリズムを導入する。大規模な実験により、Lizardは教師モデルの性能を標準言語モデリングタスクでほぼ無作為に回復し、従来の線形化手法よりも大幅に向上した。 5ショットのMMLUベンチマークでは、Lizardは以前のモデルよりも18ポイント改善され、連想的リコールタスクが大幅に改善された。

論文の概要: Lizard: An Efficient Linearization Framework for Large Language Models

関連論文リスト