Fugu-MT 論文翻訳(概要): CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

論文の概要: CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

arxiv url: http://arxiv.org/abs/2605.25511v1
Date: Mon, 25 May 2026 07:15:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.434603
Title: CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents
Title（参考訳）: CRPO:ロールプレイングエージェントにおける役割認識推論のための文字中心のグループ相対ポリシー最適化
Authors: Yihong Tang, Kehai Chen, Liang Yue, Benyou Wang, Min Zhang,
Abstract要約: 本稿では,ロールプレイングタスクで目的を実現するためのフレームワークCRPOを提案する。 CRPOは3つのメカニズムにより、タスクロジックをスタイリスティックな報酬から切り離して勾配競合を解消し、文字複雑性に基づいた最適化制約を動的に適応させ、一般的な応答を負のベースラインとして利用し、モデルが共通の分布に戻すのを防ぐ。
参考スコア（独自算出の注目度）: 53.765941044015854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Reinforcement Learning (RL), particularly Group Relative Policy Optimization (GRPO), have significantly enhanced the reasoning capabilities of Large Language Models. However, applying these problem-centric optimization methods to role-playing agents often leads to a loss of character fidelity and style collapse, as they prioritize context-specific utility over persona alignment. To address this, we propose Character-Centric Group Relative Policy Optimization (CRPO), a framework designed to realign RL objectives with the role-playing task. CRPO improves character distinctiveness through three mechanisms: decoupling task logic from stylistic rewards to resolve gradient conflicts, dynamically adapting optimization constraints based on character complexity, and utilizing generic responses as negative baselines to prevent the model from reverting to a common distribution. Extensive experiments demonstrate that CRPO outperforms existing methods in consistency, emotion and others.
Abstract（参考訳）: 強化学習(RL)の最近の進歩、特にグループ相対政策最適化(GRPO)は、大規模言語モデルの推論能力を大幅に向上させてきた。しかしながら、これらの問題中心の最適化手法をロールプレイングエージェントに適用すると、ペルソナアライメントよりもコンテキスト固有のユーティリティが優先されるため、キャラクタの忠実さやスタイルの崩壊が失われることが多い。そこで本稿では,RL目標をロールプレイングタスクで実現するためのフレームワークであるCRPOを提案する。 CRPOは3つのメカニズムにより、タスクロジックをスタイリスティックな報酬から切り離して勾配競合を解消し、文字複雑性に基づいた最適化制約を動的に適応させ、一般的な応答を負のベースラインとして利用し、モデルが共通の分布に戻すのを防ぐ。 CRPOは、一貫性や感情などにおいて、既存の手法よりも優れています。

論文の概要: CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

関連論文リスト