Fugu-MT 論文翻訳(概要): Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

論文の概要: Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

arxiv url: http://arxiv.org/abs/2510.01925v2
Date: Fri, 03 Oct 2025 10:55:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 12:05:48.08037
Title: Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Title（参考訳）: リワードモデルによる大規模言語モデル推論の強化:分析的調査
Authors: Qiyuan Liu, Hao Xu, Xuhong Chen, Wei Chen, Yee Whye Teh, Ning Miao,
Abstract要約: リワードモデル(RM)はLLMの推論性能を高める上で重要な役割を担っている。本稿では, RM の体系的紹介と LLM 推論におけるそれらの応用に関する包括的調査について述べる。
参考スコア（独自算出の注目度）: 30.86011404499129
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reward models (RMs) play a critical role in enhancing the reasoning performance of LLMs. For example, they can provide training signals to finetune LLMs during reinforcement learning (RL) and help select the best answer from multiple candidates during inference. In this paper, we provide a systematic introduction to RMs, along with a comprehensive survey of their applications in LLM reasoning. We first review fundamental concepts of RMs, including their architectures, training methodologies, and evaluation techniques. Then, we explore their key applications: (1) guiding generation and selecting optimal outputs during LLM inference, (2) facilitating data synthesis and iterative self-improvement for LLMs, and (3) providing training signals in RL-based finetuning. Finally, we discuss critical open questions regarding the selection, generalization, evaluation, and enhancement of RMs, based on existing research and our own empirical findings. Our analysis aims to provide actionable insights for the effective deployment and advancement of RMs for LLM reasoning.
Abstract（参考訳）: リワードモデル(RM)はLLMの推論性能を高める上で重要な役割を担っている。例えば、強化学習(RL)中にLLMを微調整するためのトレーニング信号を提供し、推論中に複数の候補から最高の回答を選択するのに役立つ。本稿では, RMの系統的導入について紹介するとともに, LLM推論におけるその応用に関する包括的調査を行う。まず,RMの基本概念,アーキテクチャ,トレーニング手法,評価手法について概説する。次に,1) LLM推論における生成と最適出力の選択,(2) LLMにおけるデータ合成と反復的自己改善の促進,(3) RLに基づくファインタニングにおけるトレーニング信号の提供,といった主な応用について検討する。最後に、既存の研究と我々の経験的知見に基づいて、RMの選択、一般化、評価、強化に関する重要なオープンな疑問について論じる。本分析は,LLM推論のためのRMの効果的展開と高度化のための実用的な洞察を提供することを目的としている。

論文の概要: Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

関連論文リスト