Fugu-MT 論文翻訳(概要): Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

論文の概要: Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

arxiv url: http://arxiv.org/abs/2604.12196v1
Date: Tue, 14 Apr 2026 02:02:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.195024
Title: Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score
Title（参考訳）: 多数決投票を超えて: ラジアル・コンセンサススコアによる効率の良いベストOf-N
Authors: Manh Nguyen, Sunil Gupta, Hung Le,
Abstract要約: Radial Consensus Score (RCS) は、N選択のための単純で効率的で訓練のない方法である。 RCSは、重み付きフレシェ平均(意味中心)を計算して意味的コンセンサスをモデル化する。
参考スコア（独自算出の注目度）: 13.41454380481593
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) frequently generate multiple candidate responses for a given prompt, yet selecting the most reliable one remains challenging, especially when correctness diverges from surface-level majority agreement. Existing approaches, such as self-consistency, rely on discrete voting, while probability-based methods often fail to capture relationships among candidate answers or tend to underweight high-quality but less frequent responses, and do not fully leverage the geometric structure of answer representations. To address these limitations, we introduce Radial Consensus Score (RCS), a simple, efficient, and training-free method for best-of-N selection. RCS models semantic consensus by computing a weighted Fréchet mean (semantic center) of answer embeddings and ranking candidates by their radial distance to this center. Importantly, RCS provides a general framework that supports multiple weighting schemes, including uniform, frequency-based, and probability-based variants, enabling flexible integration of agreement signals and model confidence while remaining fully applicable in black-box settings. Extensive experiments across seven benchmarks covering short-form QA and long-form reasoning tasks, and five open-weight models, demonstrate that RCS variants consistently outperform strong baselines, with gains becoming more pronounced as the sampling budget increases. RCS also serves as an effective drop-in replacement for majority voting in multi-agent debate and exhibits strong robustness in black-box scenarios. Overall, these results highlight geometric consensus as a scalable and broadly applicable principle for reliable answer selection, extending beyond majority voting to more expressive and robust aggregation in LLM inference.
Abstract（参考訳）: 大規模言語モデル(LLM)は、与えられたプロンプトに対して複数の候補応答を頻繁に生成するが、最も信頼性の高いものを選択することは、特に表面的な多数決から正確性が分岐する場合は困難である。自己整合性のような既存のアプローチは、離散的な投票に依存しているが、確率ベースの手法は、しばしば候補者の回答間の関係を捉えたり、より低品質で頻繁な応答を控える傾向があり、答え表現の幾何学的構造を完全に活用しない。これらの制約に対処するために,N 選択のための単純で効率的かつトレーニング不要な Radial Consensus Score (RCS) を導入する。 RCSは、重み付きフレシェ平均(意味中心)を計算して意味的コンセンサスをモデル化する。重要なことに、RCSは、一様、周波数ベース、確率ベースの変種を含む複数の重み付けスキームをサポートする一般的なフレームワークを提供し、ブラックボックス設定に完全に適用しながら、合意信号とモデルの信頼性の柔軟な統合を可能にする。ショートフォームQAとロングフォーム推論タスク、および5つのオープンウェイトモデルをカバーする7つのベンチマークの広範な実験は、RCSの変種が、サンプリング予算が増加するにつれて利得がより顕著になることを示す。 RCSはまた、マルチエージェント討論における多数決の効果的な代替として機能し、ブラックボックスのシナリオで強い堅牢性を示す。全体として、これらの結果は、信頼性の高い回答選択のためのスケーラブルで広く適用可能な原理としての幾何学的コンセンサスを強調し、多数決を超えて、LLM推論においてより表現力が高く頑健なアグリゲーションへと拡張する。

論文の概要: Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

関連論文リスト