Fugu-MT 論文翻訳(概要): Landmark Attention: Random-Access Infinite Context Length for Transformers

論文の概要: Landmark Attention: Random-Access Infinite Context Length for Transformers

arxiv url: http://arxiv.org/abs/2305.16300v1
Date: Thu, 25 May 2023 17:53:42 GMT
ステータス: 翻訳完了
システム内更新日: 2023-05-26 13:11:56.870636
Title: Landmark Attention: Random-Access Infinite Context Length for Transformers
Title（参考訳）: ランドマーク注意:トランスフォーマーのランダムアクセス無限コンテキスト長
Authors: Amirkeivan Mohtashami, Martin Jaggi
Abstract要約: ランダムアクセスの柔軟性を維持しつつ、完全なコンテキストへのアクセスを可能にする新しいアプローチを提案する。本手法では,入力の各ブロックをランドマークトークンで表現し,関連するブロックを選択するために注目度をトレーニングする。提案手法は,特殊なデータ構造とシステムのメモリ階層とシームレスに統合され,任意の長さのコンテキストを処理できる。
参考スコア（独自算出の注目度）: 57.202540419700135
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts. Prior approaches, such as recurrent memory or retrieval-based augmentation, have either compromised the random-access flexibility of attention (i.e., the capability to select any token in the entire context) or relied on separate mechanisms for relevant context retrieval, which may not be compatible with the model's attention. In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. Our method uses a landmark token to represent each block of the input and trains the attention to use it for selecting relevant blocks, enabling retrieval of blocks directly through the attention mechanism instead of by relying on a separate mechanism. Our approach seamlessly integrates with specialized data structures and the system's memory hierarchy, enabling processing of arbitrarily long context lengths. We demonstrate that our method can obtain comparable performance with Transformer-XL while significantly reducing the number of retrieved tokens in each step. Finally, we show that fine-tuning LLaMA 7B with our method successfully extends its context length capacity up to 32k tokens, allowing for inference at the context lengths of GPT-4.
Abstract（参考訳）: トランスフォーマーは自然言語処理において顕著な成功を収めているが、その注意力機構の大きなメモリ要件は、長いコンテキストを扱う能力に制限がある。リカレントメモリや検索ベースの拡張といった以前のアプローチは、注意のランダムアクセスの柔軟性(すなわち、コンテキスト全体において任意のトークンを選択できる能力)を損なうか、モデルの注意と互換性のない、関連するコンテキスト検索のための別のメカニズムに依存するかのどちらかである。本稿では,ランダムアクセスの柔軟性を維持しつつ,完全なコンテキストへのアクセスを可能にする新しい手法を提案する。本手法では,入力の各ブロックをランドマークトークンで表現し,関連するブロックを選択するために注意を訓練し,別の機構に頼るのではなく,注意機構を通じて直接ブロックを検索できるようにする。提案手法は,特殊なデータ構造とシステムのメモリ階層とシームレスに統合され,任意の長さのコンテキストを処理できる。提案手法はTransformer-XLと同等の性能を示し,各ステップで取得したトークンの数を大幅に削減する。最後に,提案手法を用いたllama 7bの微調整により,最大32kトークンまで拡張でき,gpt-4のコンテキスト長での推論が可能となった。

関連論文リスト

Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons [0.0]
非常に長いコンテキストウインドウを効率的に処理する大規模言語モデル(LLM)のための新しい非注意型アーキテクチャを提案する。自己注意機構の性質から2次記憶と過負荷に悩まされている従来のTransformer設計とは異なり、当社のモデルはトークンによる注意の注意を完全に回避している。
論文参考訳（メタデータ） (2025-05-09T00:25:46Z)
Quantifying Memory Utilization with Effective State-Size [73.52115209375343]
「我々は、テキスト・メモリ利用の尺度を策定する。」この計量は、textitinput-invariant および textitinput-variant linear operator を持つシステムの基本的なクラスに適合する。
論文参考訳（メタデータ） (2025-04-28T08:12:30Z)
Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling [42.67141329779589]
Grouped Cross Attentionは、トレーニング前のコンテキスト長の1000倍に一般化することができる。実験により,16Mコンテキスト長のパスキー検索において,GAAに基づくモデルがほぼ完全であることが示された。
論文参考訳（メタデータ） (2024-10-02T15:18:34Z)
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention [6.713196608291278]
本研究では,トランスフォーマーをベースとしたLarge Language Modelを,メモリと計算を境界とした無限に長い入力に拡張する効率的な手法を提案する。提案手法の重要な要素は、Infini-attentionと呼ばれる新しい注意手法である。
論文参考訳（メタデータ） (2024-04-10T16:18:42Z)
Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention [17.48544285026157]
長文集中型変換器であるFovea Transformerを紹介する。問合せトークンへの距離が増加するにつれて、木に徐々に粗い粒度を持つコンテキストトークンの表現を使用する。 3つの長文要約タスクにおいて,本モデルを評価する。
論文参考訳（メタデータ） (2023-11-13T06:24:27Z)
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading [63.93888816206071]
我々は,長いコンテキストを要約ノードのツリーに処理する手法であるMemWalkerを紹介した。クエリを受信すると,モデルがこのツリーをナビゲートして関連する情報を検索し,十分な情報を収集すると応答する。その結果,MemWalkerは,テキストを対話的に読み取る際の推論ステップを強調し,クエリに関連するテキストセグメントをピンポイントすることで,説明性の向上を図っている。
論文参考訳（メタデータ） (2023-10-08T06:18:14Z)
Ring Attention with Blockwise Transformers for Near-Infinite Context [88.61687950039662]
本稿では,複数のデバイスにまたがって長いシーケンスを分散するために,ブロックワイドな自己注意とフィードフォワードの計算を利用する,ブロックワイドトランスフォーマーを用いたリングアテンション(リングアテンション)を提案する。提案手法では,先行メモリ効率の変換器で達成可能なものよりも,デバイス数倍のシーケンスのトレーニングと推論が可能となる。
論文参考訳（メタデータ） (2023-10-03T08:44:50Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) は、新しい汎用アテンションブロックであり、その出力を一定メモリで計算し、一定計算で更新を実行する。提案手法は,メモリ効率を著しく向上しつつ,最先端技術と競合する結果が得られることを示す。
論文参考訳（メタデータ） (2023-06-21T22:41:58Z)
ABC: Attention with Bounded-memory Control [67.40631793251997]
我々は,有界メモリ制御 (ABC) を1つの抽象概念,すなわち有界メモリ制御 (ABC) に仮定できることを示した。 ABCが新たな可能性を明らかにしました。まずは、他の方法では見分けがつかないような、効率的なアテンションのバリエーションを接続します。最後に,既存のABCアプローチからインスピレーションを得たABCの新しい事例を紹介する。
論文参考訳（メタデータ） (2021-10-06T03:53:25Z)
Learning Hard Retrieval Decoder Attention for Transformers [69.40942736249397]
トランスフォーマー変換モデルは、容易に並列化できるマルチヘッドアテンション機構に基づいている。ハード検索の注意機構は復号化の1.43倍高速であることを示す。
論文参考訳（メタデータ） (2020-09-30T13:18:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。