Fugu-MT 論文翻訳(概要): Kimi Linear: An Expressive, Efficient Attention Architecture

論文の概要: Kimi Linear: An Expressive, Efficient Attention Architecture

arxiv url: http://arxiv.org/abs/2510.26692v1
Date: Thu, 30 Oct 2025 16:59:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.913486
Title: Kimi Linear: An Expressive, Efficient Attention Architecture
Title（参考訳）: Kimi Linear: 表現力のある効果的な注意アーキテクチャ
Authors: Kimi Team, Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, T. Y. Liu, Haiming Wang, Shengjun Fang, Weiran He, Shaowei Liu, Yiwei Li, Jianlin Su, Jiezhong Qiu, Bo Pang, Junjie Yan, Zhejun Jiang, Weixiao Huang, Bohong Yin, Jiacheng You, Chu Wei, Zhengtao Wang, Chao Hong, Yutian Chen, Guanduo Chen, Yucheng Wang, Huabin Zheng, Feng Wang, Yibo Liu, Mengnan Dong, Zheng Zhang, Siyuan Pan, Wenhao Wu, Yuhao Wu, Longyu Guan, Jiawen Tao, Guohong Fu, Xinran Xu, Yuzhi Wang, Guokun Lai, Yuxin Wu, Xinyu Zhou, Zhilin Yang, Yulun Du,
Abstract要約: Kimi Linearはハイブリッドな線形アテンションアーキテクチャで、初めて、公正な比較で完全にアテンションを上回ります。中心となるKimi Delta Attention (KDA)は、Gated DeltaNetを拡張した表現力のある線形アテンションモジュールである。我々は,Kimi Linearがより優れた性能と効率で十分な注意を払って,ドロップインで置き換えられることを示す。
参考スコア（独自算出の注目度）: 75.89211364086309
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mechanism, enabling more effective use of limited finite-state RNN memory. Our bespoke chunkwise algorithm achieves high hardware efficiency through a specialized variant of the Diagonal-Plus-Low-Rank (DPLR) transition matrices, which substantially reduces computation compared to the general DPLR formulation while remaining more consistent with the classical delta rule. We pretrain a Kimi Linear model with 3B activated parameters and 48B total parameters, based on a layerwise hybrid of KDA and Multi-Head Latent Attention (MLA). Our experiments show that with an identical training recipe, Kimi Linear outperforms full MLA with a sizeable margin across all evaluated tasks, while reducing KV cache usage by up to 75% and achieving up to 6 times decoding throughput for a 1M context. These results demonstrate that Kimi Linear can be a drop-in replacement for full attention architectures with superior performance and efficiency, including tasks with longer input and output lengths. To support further research, we open-source the KDA kernel and vLLM implementations, and release the pre-trained and instruction-tuned model checkpoints.
Abstract（参考訳）: 私たちはKimi Linearというハイブリッド線形アテンションアーキテクチャを紹介します。これは、短いコンテキスト、長いコンテキスト、強化学習(RL)スケーリングレジームなど、さまざまなシナリオにおける公正な比較において、初めて、完全な注意力を発揮するものです。中心となるKim Delta Attention (KDA)は、Gated DeltaNetをよりきめ細かいゲーティング機構で拡張し、有限状態RNNメモリのより効率的な使用を可能にする、表現力のある線形アテンションモジュールである。提案アルゴリズムは,従来のデルタ法則と整合性を保ちながら,一般のDPLR法よりも計算量を大幅に削減する,Diaagonal-Plus-Low-Rank(DPLR)遷移行列の特殊変種を用いて,高いハードウェア効率を実現する。我々は,KDAとMulti-Head Latent Attention (MLA)の階層的ハイブリッドに基づいて,3Bの活性化パラメータと48Bの総パラメータを持つキミ線形モデルを事前学習する。実験の結果,Kimi Linearは,KVキャッシュ使用率を最大75%削減し,100Mコンテキストで最大6倍の復号スループットを実現した。これらの結果から,Kimi Linearは,より長い入力長と出力長のタスクを含む,パフォーマンスと効率の優れたフルアテンションアーキテクチャの代替となる可能性が示唆された。さらなる研究を支援するため、我々はKDAカーネルとvLLM実装をオープンソース化し、事前訓練されたモデルチェックポイントと命令調整されたモデルチェックポイントをリリースする。

論文の概要: Kimi Linear: An Expressive, Efficient Attention Architecture

関連論文リスト