Fugu-MT 論文翻訳(概要): Let's Verify Math Questions Step by Step

論文の概要: Let's Verify Math Questions Step by Step

arxiv url: http://arxiv.org/abs/2505.13903v1
Date: Tue, 20 May 2025 04:07:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-21 14:49:52.674795
Title: Let's Verify Math Questions Step by Step
Title（参考訳）: 数学の質問を段階的に検証しよう
Authors: Chengyu Shen, Zhen Hao Wong, Runming He, Hao Liang, Meiyi Qiang, Zimo Meng, Zhengyang Zhao, Bohan Zeng, Zhengzhou Zhu, Bin Cui, Wentao Zhang,
Abstract要約: MathQ-Verifyは、未定または未定の数学問題を厳格にフィルタリングするために設計された、新しいパイプラインである。 MathQ-Verifyはまず、冗長な命令を削除するためのフォーマットレベルのバリデーションを実行する。その後、各質問を形式化し、それを原子状態に分解し、数学的定義に対して検証する。
参考スコア（独自算出の注目度）: 29.69769942300042
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have recently achieved remarkable progress in mathematical reasoning. To enable such capabilities, many existing works distill strong reasoning models into long chains of thought or design algorithms to construct high-quality math QA data for training. However, these efforts primarily focus on generating correct reasoning paths and answers, while largely overlooking the validity of the questions themselves. In this work, we propose Math Question Verification (MathQ-Verify), a novel five-stage pipeline designed to rigorously filter ill-posed or under-specified math problems. MathQ-Verify first performs format-level validation to remove redundant instructions and ensure that each question is syntactically well-formed. It then formalizes each question, decomposes it into atomic conditions, and verifies them against mathematical definitions. Next, it detects logical contradictions among these conditions, followed by a goal-oriented completeness check to ensure the question provides sufficient information for solving. To evaluate this task, we use existing benchmarks along with an additional dataset we construct, containing 2,147 math questions with diverse error types, each manually double-validated. Experiments show that MathQ-Verify achieves state-of-the-art performance across multiple benchmarks, improving the F1 score by up to 25 percentage points over the direct verification baseline. It further attains approximately 90% precision and 63% recall through a lightweight model voting scheme. MathQ-Verify offers a scalable and accurate solution for curating reliable mathematical datasets, reducing label noise and avoiding unnecessary computation on invalid questions. Our code and data are available at https://github.com/scuuy/MathQ-Verify.
Abstract（参考訳）: 大規模言語モデル(LLM)は近年,数学的推論において顕著な進歩を遂げている。このような機能を実現するために、多くの既存の研究は、強力な推論モデルを長いチェーンの思考や設計アルゴリズムに蒸留し、訓練のための高品質な数学QAデータを構築する。しかしながら、これらの取り組みは、主に正しい推論パスと答えを生成することに焦点を当て、質問自体の有効性を概ね見落としている。本研究では,不特定あるいは不特定の数学問題を厳格にフィルタする新しい5段階パイプラインであるMathQ-Verifyを提案する。 MathQ-Verifyはまずフォーマットレベルの検証を行い、冗長な命令を取り除き、各質問が構文的によく整列されていることを保証する。その後、各質問を形式化し、それを原子状態に分解し、数学的定義に対して検証する。次に、これらの条件間の論理的矛盾を検出し、続いてゴール志向の完全性チェックを行い、質問が解決に十分な情報を提供することを確認する。このタスクを評価するために、既存のベンチマークと構築した追加データセットを使用し、それぞれが手動で二重検証された2,147の数学質問を含む。実験により、MathQ-Verifyは複数のベンチマークで最先端のパフォーマンスを実現し、F1スコアが直接検証ベースラインで最大25ポイント向上した。さらに、軽量なモデル投票方式により、およそ90%の精度と63%のリコールを実現している。 MathQ-Verifyは、信頼性のある数学的データセットを計算し、ラベルノイズを低減し、不正な質問に対する不要な計算を避ける、スケーラブルで正確なソリューションを提供する。私たちのコードとデータはhttps://github.com/scuuy/MathQ-Verify.comで公開されています。

論文の概要: Let's Verify Math Questions Step by Step

関連論文リスト