Fugu-MT 論文翻訳(概要): Think Right, Not More: Test-Time Scaling for Numerical Claim Verification

論文の概要: Think Right, Not More: Test-Time Scaling for Numerical Claim Verification

arxiv url: http://arxiv.org/abs/2509.22101v1
Date: Fri, 26 Sep 2025 09:23:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.328586
Title: Think Right, Not More: Test-Time Scaling for Numerical Claim Verification
Title（参考訳）: 数値的クレーム検証のためのテスト時間スケーリング
Authors: Primakov Chungkham, V Venktesh, Vinay Setty, Avishek Anand,
Abstract要約: テスト時間計算は複雑な数値的なクレームの検証に有効であることを示す。クレームの認識複雑性に基づいてTTSを選択的に実行する適応機構を提案する。このアプローチは標準のTSよりも1.8倍高い効率を実現し、シングルショットクレーム検証法よりも18.8%の性能向上を実現している。
参考スコア（独自算出の注目度）: 14.07771397213171
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fact-checking real-world claims, particularly numerical claims, is inherently complex that require multistep reasoning and numerical reasoning for verifying diverse aspects of the claim. Although large language models (LLMs) including reasoning models have made tremendous advances, they still fall short on fact-checking real-world claims that require a combination of compositional and numerical reasoning. They are unable to understand nuance of numerical aspects, and are also susceptible to the reasoning drift issue, where the model is unable to contextualize diverse information resulting in misinterpretation and backtracking of reasoning process. In this work, we systematically explore scaling test-time compute (TTS) for LLMs on the task of fact-checking complex numerical claims, which entails eliciting multiple reasoning paths from an LLM. We train a verifier model (VERIFIERFC) to navigate this space of possible reasoning paths and select one that could lead to the correct verdict. We observe that TTS helps mitigate the reasoning drift issue, leading to significant performance gains for fact-checking numerical claims. To improve compute efficiency in TTS, we introduce an adaptive mechanism that performs TTS selectively based on the perceived complexity of the claim. This approach achieves 1.8x higher efficiency than standard TTS, while delivering a notable 18.8% performance improvement over single-shot claim verification methods. Our code and data can be found at https://github.com/VenkteshV/VerifierFC
Abstract（参考訳）: Fact-checking real-world claims、特に数値的クレームは本質的に複雑であり、クレームの様々な側面を検証するために多段階の推論と数値的推論を必要とする。推論モデルを含む大規模言語モデル(LLM)は大きな進歩を遂げているが、構成的推論と数値的推論の組み合わせを必要とする実世界のクレームはいまだに不足している。彼らは数値的な側面のニュアンスを理解することができず、推論のドリフト問題にも影響しうる。本研究では, LLMから複数の推論経路を抽出する複雑な数値クレームをファクトチェックする作業において, LLMのスケーリングテスト時間計算(TTS)を体系的に検討する。検証モデル(VERIFIERFC)をトレーニングして、推論パスのこの空間をナビゲートし、正しい判断につながる可能性のあるものを選択する。我々は、TSが推論ドリフト問題を緩和し、事実チェックの数値クレームにおいて大きなパフォーマンス向上をもたらすことを観察した。 TTSの計算効率を向上させるため,要求項の複雑性に基づいてTTSを選択的に実行する適応機構を導入する。このアプローチは標準のTSよりも1.8倍高い効率を実現し、シングルショットクレーム検証法よりも18.8%の性能向上を実現している。私たちのコードとデータはhttps://github.com/VenkteshV/VerifierFCで確認できます。

論文の概要: Think Right, Not More: Test-Time Scaling for Numerical Claim Verification

関連論文リスト