Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR
- URL: http://arxiv.org/abs/2601.04611v1
- Date: Thu, 08 Jan 2026 05:33:37 GMT
- Title: Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR
- Authors: Yihong Tang, Kehai Chen, Xuefeng Bai, Benyou Wang, Zeming Liu, Haifeng Wang, Min Zhang,
- Abstract summary: Character-R1 is a framework designed to provide verifiable reward signals for effective role-aware reasoning.<n>Our framework comprises three core designs: Cognitive Focus Reward, Reference-Guided Reward and Character-Conditioned Reward Normalization.
- Score: 67.66592867046229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current role-playing agents (RPAs) are typically constructed by imitating surface-level behaviors, but this approach lacks internal cognitive consistency, often causing out-of-character errors in complex situations. To address this, we propose Character-R1, a framework designed to provide comprehensive verifiable reward signals for effective role-aware reasoning, which are missing in recent studies. Specifically, our framework comprises three core designs: (1) Cognitive Focus Reward, which enforces explicit label-based analysis of 10 character elements (e.g., worldview) to structure internal cognition; (2) Reference-Guided Reward, which utilizes overlap-based metrics with reference responses as optimization anchors to enhance exploration and performance; and (3) Character-Conditioned Reward Normalization, which adjusts reward distributions based on character categories to ensure robust optimization across heterogeneous roles. Extensive experiments demonstrate that Character-R1 significantly outperforms existing methods in knowledge, memory and others.
Related papers
- Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z) - Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks [6.881699020319577]
We propose Direct Reasoning Optimization (DRO), a reinforcement learning framework for fine-tuning Large Language Models (LLMs)<n>DRO is guided by a new reward signal: the Reasoning Reflection Reward (R3)<n>DRO consistently outperforms strong baselines while remaining broadly applicable across both open-ended and structured domains.
arXiv Detail & Related papers (2025-06-16T10:43:38Z) - RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z) - Reward-Aware Proto-Representations in Reinforcement Learning [6.855996110012974]
In recent years, the successor representation (SR) has attracted increasing attention in reinforcement learning (RL)<n>In this paper, we discuss a similar representation that also takes into account the reward dynamics of the problem.<n>Our results show that, compared to the SR, the DR gives rise to qualitatively different, reward-aware behaviour and quantitatively better performance in several settings.
arXiv Detail & Related papers (2025-05-22T04:33:00Z) - RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward [7.9399136525335585]
RAIDEN-R1 is a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR)<n>We construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration.<n> Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority.
arXiv Detail & Related papers (2025-05-15T12:22:10Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning [22.825527641316192]
This paper introduces ARR, an intuitive, effective, and general QA solving method.<n>It explicitly incorporates three key steps: analyzing the intent of the question, retrieving relevant information, and reasoning step by step.<n>It is the first to introduce intent analysis in QA, which plays a vital role in ARR.
arXiv Detail & Related papers (2025-02-07T06:30:33Z) - Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
We propose an Iterative utiliTy judgm fraEntMework (ITEM) to promote each step in Retrieval-Augmented Generation (RAG)<n>RAG's three core components -- relevance ranking derived from retrieval models, utility judgments, and answer generation -- align with Schutz's philosophical system of relevances.<n> Experimental results demonstrate significant improvements of ITEM in utility judgments, ranking, and answer generation upon representative baselines.
arXiv Detail & Related papers (2024-06-17T07:52:42Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.