Fugu-MT 論文翻訳(概要): A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

論文の概要: A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.07958v1
Date: Thu, 09 Oct 2025 08:53:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.965594
Title: A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Title（参考訳）: A$^2$Search: 強化学習によるあいまいさを意識した質問応答
Authors: Fengji Zhang, Xinyao Niu, Chengyang Ying, Guancheng Lin, Zhongkai Hao, Zhou Fan, Chengen Huang, Jacky Keung, Bei Chen, Junyang Lin,
Abstract要約: A$2$Searchはアノテーションのないエンドツーエンドのトレーニングフレームワークで、曖昧さを認識し、扱います。 8つのオープンドメインQAベンチマークの実験では、A$2$Searchが新しい最先端のパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 46.81869577197105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Large Language Models (LLMs) and Reinforcement Learning (RL) have led to strong performance in open-domain question answering (QA). However, existing models still struggle with questions that admit multiple valid answers. Standard QA benchmarks, which typically assume a single gold answer, overlook this reality and thus produce inappropriate training signals. Existing attempts to handle ambiguity often rely on costly manual annotation, which is difficult to scale to multi-hop datasets such as HotpotQA and MuSiQue. In this paper, we present A$^2$Search, an annotation-free, end-to-end training framework to recognize and handle ambiguity. At its core is an automated pipeline that detects ambiguous questions and gathers alternative answers via trajectory sampling and evidence verification. The model is then optimized with RL using a carefully designed $\mathrm{AnsF1}$ reward, which naturally accommodates multiple answers. Experiments on eight open-domain QA benchmarks demonstrate that A$^2$Search achieves new state-of-the-art performance. With only a single rollout, A$^2$Search-7B yields an average $\mathrm{AnsF1}@1$ score of $48.4\%$ across four multi-hop benchmarks, outperforming all strong baselines, including the substantially larger ReSearch-32B ($46.2\%$). Extensive analyses further show that A$^2$Search resolves ambiguity and generalizes across benchmarks, highlighting that embracing ambiguity is essential for building more reliable QA systems. Our code, data, and model weights can be found at https://github.com/zfj1998/A2Search
Abstract（参考訳）: 大規模言語モデル (LLM) と強化学習 (RL) の最近の進歩は, オープンドメイン質問応答 (QA) において高いパフォーマンスをもたらしている。しかし、既存のモデルは、複数の有効な答えを認める質問に苦戦している。標準QAベンチマークは、通常、単一の金の答えを仮定し、この現実を見落とし、不適切なトレーニング信号を生成する。あいまいさを扱う既存の試みは、しばしばコストのかかる手作業によるアノテーションに依存しており、HotpotQAやMuSiQueのようなマルチホップデータセットにスケールすることが難しい。本稿では、アノテーションのないエンドツーエンドのトレーニングフレームワークであるA$^2$Searchを紹介し、あいまいさを認識し、対処する。中心となるのは、曖昧な質問を検出し、軌跡サンプリングと証拠検証を通じて別の回答を収集する自動パイプラインである。その後、モデルは慎重に設計された$\mathrm{AnsF1}$ rewardを使ってRLで最適化される。 8つのオープンドメインQAベンチマークの実験は、A$^2$Searchが新しい最先端のパフォーマンスを達成することを示した。 1回のロールアウトだけで、A$^2$Search-7Bは4つのマルチホップベンチマークで平均$18.4\%のスコアを得る。包括的な分析により、A$^2$Searchはあいまいさを解消し、ベンチマークをまたいで一般化し、より信頼性の高いQAシステムを構築するためにはあいまいさを受け入れることが不可欠であることが示された。コード、データ、モデルの重み付けはhttps://github.com/zfj1998/A2Searchで確認できます。

論文の概要: A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning

関連論文リスト