Fugu-MT 論文翻訳(概要): $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

論文の概要: $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

arxiv url: http://arxiv.org/abs/2603.04304v1
Date: Wed, 04 Mar 2026 17:22:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-05 21:29:15.422809
Title: $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners
Title（参考訳）: $V_1$:並列共振器の生成と自己検証
Authors: Harman Singh, Xiuyu Li, Kusha Sareen, Monishwaran Maheswaran, Sijun Tan, Xiaoxia Wu, Junxiong Wang, Alpay Ariyak, Qingyang Wu, Samir Khaki, Rishabh Tiwari, Long Lian, Yucheng Lu, Boyi Li, Alane Suhr, Ben Athiwaratkun, Kurt Keutzer,
Abstract要約: $V_$は、効率的なペアワイドランキングを通じて生成と検証を統合するフレームワークである。 V_$-Inferはポイントワイド検証でPass@1を最大10%改善する。 V_$-PairRLは、標準のRLとポイントワイドのジョイントトレーニングよりも、テストタイムのスケーリングが7ドル--9%で向上する。
参考スコア（独自算出の注目度）: 69.66089681814013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time scaling for complex reasoning tasks shows that leveraging inference-time compute, by methods such as independently sampling and aggregating multiple solutions, results in significantly better task outcomes. However, a critical bottleneck is verification: sampling is only effective if correct solutions can be reliably identified among candidates. While existing approaches typically evaluate candidates independently via scalar scoring, we demonstrate that models are substantially stronger at pairwise self-verification. Leveraging this insight, we introduce $V_1$, a framework that unifies generation and verification through efficient pairwise ranking. $V_1$ comprises two components: $V_1$-Infer, an uncertainty-guided algorithm using a tournament-based ranking that dynamically allocates self-verification compute to candidate pairs whose relative correctness is most uncertain; and $V_1$-PairRL, an RL framework that jointly trains a single model as both generator and pairwise self-verifier, ensuring the verifier adapts to the generator's evolving distribution. On code generation (LiveCodeBench, CodeContests, SWE-Bench) and math reasoning (AIME, HMMT) benchmarks, $V_1$-Infer improves Pass@1 by up to $10%$ over pointwise verification and outperforms recent test-time scaling methods while being significantly more efficient. Furthermore, $V_1$-PairRL achieves $7$--$9%$ test-time scaling gains over standard RL and pointwise joint training, and improves base Pass@1 by up to 8.7% over standard RL in a code-generation setting.
Abstract（参考訳）: 複雑な推論タスクに対するテスト時間スケーリングは、独立して複数のソリューションをサンプリングしたり集約したりすることで、推論時間計算を活用することにより、タスクの結果が大幅に向上することを示している。しかし、重要なボトルネックは検証である:サンプリングは、正しい解が候補間で確実に特定できる場合にのみ有効である。既存の手法は通常、スカラースコアリングによって候補を独立に評価するが、モデルがペアワイズ自己検証においてかなり強いことを実証する。この知見を生かして、効率的なペアワイドランキングによる生成と検証を統一するフレームワークである$V_1$を導入する。 V_1$-Inferはトーナメントベースのランキングを用いた不確実性誘導アルゴリズムで、相対的正しさが最も不確実な候補ペアに対して動的に自己検証計算を割り当てる。コード生成(LiveCodeBench、CodeContests、SWE-Bench)と数学推論(AIME、HMMT)ベンチマークでは、$V_1$-InferがPass@1をポイントワイドな検証で最大10%改善し、最近のテストタイムスケーリングメソッドよりもはるかに効率が良い。さらに、$V_1$-PairRLは、標準RLとポイントワイドのジョイントトレーニングよりも7--9%$テストタイムスケーリングが向上し、コードジェネレーション設定で標準RLよりも最大8.7%向上する。

論文の概要: $V_1$: Unifying Generation and Self-Verification for Parallel Reasoners

関連論文リスト