Fugu-MT 論文翻訳(概要): FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

論文の概要: FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

arxiv url: http://arxiv.org/abs/2508.14052v3
Date: Sat, 06 Sep 2025 08:43:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.282716
Title: FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
Title（参考訳）: FinAgentBench: 財務質問応答におけるエージェント検索のためのベンチマークデータセット
Authors: Chanyeol Choi, Jihoon Kwon, Alejandro Lopez-Lira, Chaewoon Kim, Minjae Kim, Juneha Hwang, Jaeseon Ha, Hojun Choi, Suyeol Yun, Yongjin Kim, Yongjae Lee,
Abstract要約: FinAgentBenchは、ファイナンスにおけるマルチステップ推論による検索評価のための最初の大規模ベンチマークである。このベンチマークは、S&P-100上場企業に関する専門家による3,429の例から成っている。我々は,最先端モデルの集合を評価し,対象の微調整がエージェント検索性能を大幅に向上することを示す。
参考スコア（独自算出の注目度）: 57.18367828883773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate information retrieval (IR) is critical in the financial domain, where investors must identify relevant information from large collections of documents. Traditional IR methods-whether sparse or dense-often fall short in retrieval accuracy, as it requires not only capturing semantic similarity but also performing fine-grained reasoning over document structure and domain-specific knowledge. Recent advances in large language models (LLMs) have opened up new opportunities for retrieval with multi-step reasoning, where the model ranks passages through iterative reasoning about which information is most relevant to a given query. However, there exists no benchmark to evaluate such capabilities in the financial domain. To address this gap, we introduce FinAgentBench, the first large-scale benchmark for evaluating retrieval with multi-step reasoning in finance -- a setting we term agentic retrieval. The benchmark consists of 3,429 expert-annotated examples on S&P-100 listed firms and assesses whether LLM agents can (1) identify the most relevant document type among candidates, and (2) pinpoint the key passage within the selected document. Our evaluation framework explicitly separates these two reasoning steps to address context limitations. This design enables to provide a quantitative basis for understanding retrieval-centric LLM behavior in finance. We evaluate a suite of state-of-the-art models and further demonstrated how targeted fine-tuning can significantly improve agentic retrieval performance. Our benchmark provides a foundation for studying retrieval-centric LLM behavior in complex, domain-specific tasks for finance.
Abstract（参考訳）: 正確な情報検索(IR)は、投資家が大量の文書から関連する情報を識別する必要がある金融分野において重要である。従来のIR手法は、意味的類似性を捉えるだけでなく、文書構造やドメイン固有の知識に対してきめ細やかな推論を行う必要があるため、検索精度が劣る。大規模言語モデル (LLM) の最近の進歩は、与えられたクエリに最も関係のある情報について反復的推論を通して、モデルを列挙する多段階推論による検索の新たな機会を開きつつある。しかし、金融分野においてそのような能力を評価するためのベンチマークは存在しない。このギャップに対処するため、FinAgentBenchを紹介します。FinAgentBenchは、ファイナンスにおけるマルチステップ推論による検索を評価するための、最初の大規模なベンチマークです。このベンチマークは、S&P-100上場企業で3,429名の専門家による注釈付き例で構成され、LLMエージェントが(1)候補のうち最も関連性の高い文書タイプを特定できるかどうかを評価し、(2)選択された文書内のキーパスをピンポイントする。評価フレームワークは、コンテキスト制限に対処するこれらの2つの推論ステップを明示的に分離する。この設計により、金融における検索中心のLLM行動を理解するための定量的基盤を提供することができる。我々は,最先端モデルの集合を評価し,対象の微調整によってエージェント検索性能が大幅に向上することを示した。我々のベンチマークは、金融のための複雑なドメイン固有のタスクにおいて、検索中心のLCMの振る舞いを研究する基盤を提供する。

論文の概要: FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

関連論文リスト