Fugu-MT 論文翻訳(概要): Evaluating List Construction and Temporal Understanding capabilities of Large Language Models

論文の概要: Evaluating List Construction and Temporal Understanding capabilities of Large Language Models

arxiv url: http://arxiv.org/abs/2506.21783v1
Date: Thu, 26 Jun 2025 21:40:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-30 21:12:23.017005
Title: Evaluating List Construction and Temporal Understanding capabilities of Large Language Models
Title（参考訳）: 大規模言語モデルのリスト構築と時間的理解能力の評価
Authors: Alexandru Dumitru, V Venktesh, Adam Jatowt, Avishek Anand,
Abstract要約: 大規模言語モデル(LLM)は、特に時間的理解タスクにおける幻覚や誤りの影響を受けやすい。本稿では,時系列に適合するリスト形式で構造化された回答を必要とするTLQA(Time Referenceed List based Question Answering)ベンチマークを提案する。閉書およびオープンドメイン設定におけるTLQA上の最先端生成モデルの時間的理解とリスト構築能力について検討する。
参考スコア（独自算出の注目度）: 54.39278049092508
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated immense advances in a wide range of natural language tasks. However, these models are susceptible to hallucinations and errors on particularly temporal understanding tasks involving multiple entities in answers. In such tasks, they fail to associate entities with accurate time intervals, generate a complete list of entities in answers or reason about events associated with specific temporal bounds. Existing works do not extensively evaluate the abilities of the model to perform implicit and explicit temporal understanding in a list answer construction setup. To bridge this gap, we propose the Time referenced List based Question Answering or TLQA benchmark that requires structured answers in list format aligned with corresponding time periods. Our TLQA benchmark, requires both list construction and temporal understanding simultaneously, which to the best of our knowledge has not been explored in prior benchmarks. We investigate the temporal understanding and list construction capabilities of state-of-the-art generative models on TLQA in closed-book and open-domain settings. Our findings reveal significant shortcomings in current models, particularly their inability to provide complete answers and temporally align facts in a closed-book setup and the need to improve retrieval in open-domain setup, providing clear future directions for research on TLQA. The benchmark and code at https://github.com/elixir-research-group/TLQA.
Abstract（参考訳）: 大規模言語モデル(LLM)は、幅広い自然言語タスクにおいて大きな進歩を見せている。しかしながら、これらのモデルは、答えに複数の実体を含む特に時間的理解タスクに対する幻覚や誤りの影響を受けやすい。このようなタスクでは、エンティティを正確な時間間隔に関連付けることができず、特定の時間的境界に関連する事象に関する回答や理由のエンティティの完全なリストを生成する。既存の作業は、リスト回答構築設定において暗黙的かつ明示的な時間的理解を行うためのモデルの能力を広範囲に評価するものではない。このギャップを埋めるために、時間参照リストベースの質問回答(TLQA)ベンチマークを提案する。我々のTLQAベンチマークでは、リストの構築と時間的理解を同時に行う必要があります。閉書およびオープンドメイン設定におけるTLQA上の最先端生成モデルの時間的理解とリスト構築能力について検討する。以上の結果から,現状のモデルでは,特にクローズドブックのセットアップにおいて,完全な回答が得られず,事実を時間的に整列できないこと,オープンドメインのセットアップにおける検索の改善の必要性,TLQA研究の今後の方向性の明確化など,重大な問題点が明らかとなった。ベンチマークとコードはhttps://github.com/elixir-research-group/TLQA。

関連論文リスト

TimeLogic: A Temporal Logic Benchmark for Video QA [64.32208175236323]
時間的論理的質問を自動的に生成するTimeLogic QA(TLQA)フレームワークを導入する。私たちはSTAR、Breakfast、AGQA、CrossTaskの4つのデータセットを活用し、カテゴリ毎に2kと10kのQAペアを生成します。時間的複雑性の異なる16カテゴリの時間論理に対して,ビデオQAモデルの時間的推論性能を評価する。
論文参考訳（メタデータ） (2025-01-13T11:12:59Z)
ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering [24.046966640011124]
ComplexTempQAは、1億以上の質問応答ペアからなる大規模なデータセットである。このデータセットは、20年以上にわたる質問をカバーし、未一致のトピックを提供している。
論文参考訳（メタデータ） (2024-06-07T12:01:59Z)
Self-Improvement Programming for Temporal Knowledge Graph Question Answering [31.33908040172437]
時間的知識グラフ質問回答(TKGQA)は、時間的知識グラフ(TKG)に対する時間的意図で質問に答えることを目的としている。既存のエンドツーエンドの手法は、質問や候補者の回答の埋め込みを学習することで、時間制約を暗黙的にモデル化する。 TKGQA(Prog-TQA)のための新しい自己改善プログラミング手法を提案する。
論文参考訳（メタデータ） (2024-04-02T08:14:27Z)
Multi-hop Question Answering under Temporal Knowledge Editing [9.356343796845662]
知識編集(KE)におけるマルチホップ質問応答(MQA)は,大規模言語モデルの時代において大きな注目を集めている。 KEの下でのMQAの既存のモデルは、明示的な時間的コンテキストを含む質問を扱う場合、パフォーマンスが劣っている。 TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE-MQA) を提案する。
論文参考訳（メタデータ） (2024-03-30T23:22:51Z)
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA)は、背景文書を明示的に提供せずに質問に答えることを目的としている。このタスクは、調整済みの検索リーダーモデルをトレーニングするデータがないゼロショット設定で顕著に困難になる。本稿では,大規模言語モデルのパラメータに符号化された膨大な知識を明示的に活用するセルフプロンプトフレームワークを提案する。
論文参考訳（メタデータ） (2022-12-16T18:23:43Z)
A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases [67.33560134350427]
TempQA-WDは時間的推論のためのベンチマークデータセットである。 Wikidataは、最も頻繁にキュレーションされ、公開されている知識ベースである。
論文参考訳（メタデータ） (2022-01-15T08:49:09Z)
Temporal and Object Quantification Networks [95.64650820186706]
複雑な関係時間事象を認識できる構造バイアスを持つニューロシンボリックネットワークを新たに提案する。我々は、TOQ-Netsが、少量のデータから、トレーニング中に存在したものよりも多くのオブジェクトを含むシナリオ、入力シーケンスの時間的ワープまでを一般化できることを実証した。
論文参考訳（メタデータ） (2021-06-10T16:18:21Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
知識集約型言語タスク(KILT)のベンチマークを示す。 KILTのすべてのタスクはウィキペディアのスナップショットと同じだ。共有密度ベクトル指数とSeq2seqモデルとの結合が強いベースラインであることが分かる。
論文参考訳（メタデータ） (2020-09-04T15:32:19Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。