Fugu-MT 論文翻訳(概要): CLEX: Continuous Length Extrapolation for Large Language Models

論文の概要: CLEX: Continuous Length Extrapolation for Large Language Models

arxiv url: http://arxiv.org/abs/2310.16450v3
Date: Sun, 24 Mar 2024 17:14:11 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-27 02:35:50.874501
Title: CLEX: Continuous Length Extrapolation for Large Language Models
Title（参考訳）: CLEX: 大規模言語モデルのための連続長外挿法
Authors: Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing,
Abstract要約: 大規模言語モデル(LLM)のためのCLEX(Continuous Longth Extrapolation)を提案する。 CLEXはコンテキストウィンドウを4倍または8倍のトレーニング長に拡張するが、性能は劣化しない。我々のモデルは4k長でトレーニングされ、最先端のオープンソースモデルに対して最大32k長でトレーニングされた。
参考スコア（独自算出の注目度）: 68.43814043853347
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding (PE) scaling methods, while effective in extending the context window to a specific length, demonstrate either notable limitations in their extrapolation abilities or sacrificing partial performance within the context window. Length extrapolation methods, although theoretically capable of extending the context window beyond the training sequence length, often underperform in practical long-context applications. To address these challenges, we propose Continuous Length EXtrapolation (CLEX) for LLMs. We generalise the PE scaling approaches to model the continuous dynamics by ordinary differential equations over the length scaling factor, thereby overcoming the constraints of current PE scaling methods designed for specific lengths. Moreover, by extending the dynamics to desired context lengths beyond the training sequence length, CLEX facilitates the length extrapolation with impressive performance in practical tasks. We demonstrate that CLEX can be seamlessly incorporated into LLMs equipped with Rotary Position Embedding, such as LLaMA and GPT-NeoX, with negligible impact on training and inference latency. Experimental results reveal that CLEX can effectively extend the context window to over 4x or almost 8x training length, with no deterioration in performance. Furthermore, when evaluated on the practical LongBench benchmark, our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k. Our code is available at https://github.com/DAMO-NLP-SG/CLEX.
Abstract（参考訳）: Transformer-based Large Language Models (LLM) は、多くの自然言語処理タスクにおいて先駆的な進歩を遂げている。位置埋め込み(PE)スケーリング手法は、コンテキストウィンドウを特定の長さに拡張するのに有効であるが、外挿能力の顕著な制限を示すか、コンテキストウィンドウ内の部分的なパフォーマンスを犠牲にする。長さ外挿法は、理論的にはトレーニングシーケンス長を超えてコンテキストウィンドウを拡張することができるが、実際的なロングコンテキスト応用では性能が劣ることが多い。これらの課題に対処するため,LLMのためのCLEX(Continuous Length Extrapolation)を提案する。 PEスケーリング手法を一般化し、長さスケーリング係数上の常微分方程式による連続力学をモデル化し、その結果、特定の長さのために設計された現在のPEスケーリング手法の制約を克服する。さらに、動的をトレーニングシーケンス長を超えて所望のコンテキスト長に拡張することにより、CLEXは、実用的なタスクにおいて印象的なパフォーマンスを持つ長さ外挿を容易にする。 CLEX は LLaMA や GPT-NeoX などのロータリー位置埋め込みを備えた LLM にシームレスに組み込むことができ、トレーニングや推論の遅延には何の影響も与えないことを示した。実験の結果,CLEXはコンテキストウィンドウを4倍,約8倍のトレーニング長に効果的に拡張でき,性能は劣化しないことがわかった。さらに,実用的LongBenchベンチマークで評価すると,4k長でトレーニングしたモデルでは,コンテクスト長でトレーニングした最先端のオープンソースモデルに対して,最大32kまでの競合性能を示す。私たちのコードはhttps://github.com/DAMO-NLP-SG/CLEXで公開されています。

論文の概要: CLEX: Continuous Length Extrapolation for Large Language Models

関連論文リスト