Fugu-MT 論文翻訳(概要): All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

論文の概要: All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

arxiv url: http://arxiv.org/abs/2601.04160v1
Date: Wed, 07 Jan 2026 18:18:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 02:15:23.71404
Title: All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection
Title（参考訳）: Glistersが金ではないものすべて: 参照なしのノンファクトな財務誤情報検出のためのベンチマーク
Authors: Yuechen Jiang, Zhiwei Liu, Yupeng Cao, Yueru He, Ziyang Xu, Chen Xu, Zhiyang Deng, Prayag Tiwari, Xi Chen, Alejandro Lopez-Lira, Jimin Huang, Junichi Tsujii, Sophia Ananiadou,
Abstract要約: RFC Benchは、現実的なニュースの下で財務的な誤情報に関する大規模な言語モデルを評価するためのベンチマークである。このベンチマークでは、2つの補完的なタスクが定義されている。
参考スコア（独自算出の注目度）: 67.89888669159899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference free misinformation detection and comparison based diagnosis using paired original perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC Bench provides a structured testbed for studying reference free reasoning and advancing more reliable financial misinformation detection in real world settings.
Abstract（参考訳）: RFC Benchは、現実的なニュースの下で、財務的誤情報に関する大規模言語モデルを評価するためのベンチマークである。 RFC Benchは段落レベルで動作し、分散された手がかりから意味が現れる金融ニュースのコンテキスト的複雑さを捉えます。このベンチマークは、2つの補完的なタスクを定義している。比較コンテキストが利用可能である場合にはパフォーマンスが大幅に向上する一方、参照自由設定は不安定な予測や不正な出力の増大など、重大な弱点を顕在化している。これらの結果は、現在のモデルは、外部の根拠なしにコヒーレントな信念状態を維持するのに苦労していることを示している。このギャップを強調することで、RFC Benchは、参照自由推論を研究し、現実の環境でより信頼性の高い金銭的誤情報検出を進めるための構造化テストベッドを提供する。

論文の概要: All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

関連論文リスト