Fugu-MT 論文翻訳(概要): LooGLE: Can Long-Context Language Models Understand Long Contexts?

論文の概要: LooGLE: Can Long-Context Language Models Understand Long Contexts?

arxiv url: http://arxiv.org/abs/2311.04939v1
Date: Wed, 8 Nov 2023 01:45:37 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-10 17:09:37.594192
Title: LooGLE: Can Long-Context Language Models Understand Long Contexts?
Title（参考訳）: LooGLE:ロングコンテキスト言語モデルはロングコンテキストを理解することができるか?
Authors: Jiaqi Li, Mengmeng Wang, Zilong Zheng, Muhan Zhang
Abstract要約: LooGLEは、大規模言語モデルの長いコンテキスト理解のためのベンチマークである。 2022年以降に比較的新しい文書が登場し、1ドキュメントあたり24,000以上のトークンと、さまざまな領域にまたがる6,000の新たな質問が提供されている。 LooGLEにおける8つの最先端LCMの評価から,重要な所見が得られた。
参考スコア（独自算出の注目度）: 50.408957515411096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have data leakage problems; and an emphasis on short dependency tasks rather than long dependency tasks. In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding. LooGLE features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains. Human annotators meticulously crafted more than 1,100 high-quality question-answer pairs to meet the long dependency requirements. These pairs underwent thorough cross-validation, yielding the most precise assessment of LLMs' long dependency capabilities. The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings: (i) commercial models outperformed open-sourced models; (ii) LLMs excelled in short dependency tasks like short question-answering and cloze tasks but struggled with more intricate long dependency tasks; (iii) in-context learning and chaining thoughts offered only marginal improvements; (iv) retrieval-based techniques demonstrated substantial benefits for short question-answering, while strategies for extending context window length had limited impact on long context understanding. As such, LooGLE not only provides a systematic and comprehensive evaluation schema on long-context LLMs, but also sheds light on future development of enhanced models towards "true long-context understanding".
Abstract（参考訳）: 大規模言語モデル(LLM)は、様々な言語タスクにおける優れた性能にもかかわらず、典型的にはコンテキストウィンドウサイズのテキスト処理に限られる。この制限により、LLMの長文理解を高品質なロングシーケンスベンチマークで強化するための重要な研究が進められた。しかし、この点における以前のデータセットは、現代のLCMのコンテキストウィンドウと比較して短いコンテキスト長、データ漏洩問題のある古いドキュメント、長い依存性タスクよりも短い依存性タスクを重視するといった欠点に悩まされている。本稿では,LLMの長期文脈理解のためのLong Context Generic Language EvaluationベンチマークであるLooGLEを提案する。 LooGLEには2022年以降の比較的新しいドキュメントがあり、ドキュメント毎に24,000以上のトークンと、さまざまなドメインにまたがる6,000の新しい質問がある。人間のアノテーションは、長い依存関係の要求を満たすために、1100以上の高品質な質問応答ペアを慎重に作り上げた。これらのペアは徹底的なクロスバリデーションを行い、LLMの長期依存能力を最も正確に評価した。 LooGLEにおける8つの最先端LCMの評価から,重要な知見が得られた。 (i)商用モデルがオープンソースモデルを上回っていること。 (ii) llmは、短い質問処理やクローズタスクのような短い依存関係タスクに優れていたが、より複雑な依存性タスクに苦しんだ。 (iii)文脈内学習と連鎖思考は、限界的な改善しか提供しなかった。 (iv) 検索に基づく手法は, 短い質問応答に有意な効果を示したが, コンテキストウインドウ長を延ばす戦略は, 長い文脈理解にはほとんど影響を与えなかった。そのため、LooGLEは長期コンテキストLLMの体系的かつ包括的な評価スキーマを提供するだけでなく、「真の長期コンテキスト理解」に向けた拡張モデルの開発にも光を当てている。

論文の概要: LooGLE: Can Long-Context Language Models Understand Long Contexts?

関連論文リスト