Related papers: Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR

Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR

URL: http://arxiv.org/abs/2601.04611v1
Date: Thu, 08 Jan 2026 05:33:37 GMT
Title: Character-R1: Enhancing Role-Aware Reasoning in Role-Playing Agents via RLVR
Authors: Yihong Tang, Kehai Chen, Xuefeng Bai, Benyou Wang, Zeming Liu, Haifeng Wang, Min Zhang,
Abstract summary: Character-R1 is a framework designed to provide verifiable reward signals for effective role-aware reasoning.<n>Our framework comprises three core designs: Cognitive Focus Reward, Reference-Guided Reward and Character-Conditioned Reward Normalization.
Score: 67.66592867046229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current role-playing agents (RPAs) are typically constructed by imitating surface-level behaviors, but this approach lacks internal cognitive consistency, often causing out-of-character errors in complex situations. To address this, we propose Character-R1, a framework designed to provide comprehensive verifiable reward signals for effective role-aware reasoning, which are missing in recent studies. Specifically, our framework comprises three core designs: (1) Cognitive Focus Reward, which enforces explicit label-based analysis of 10 character elements (e.g., worldview) to structure internal cognition; (2) Reference-Guided Reward, which utilizes overlap-based metrics with reference responses as optimization anchors to enhance exploration and performance; and (3) Character-Conditioned Reward Normalization, which adjusts reward distributions based on character categories to ensure robust optimization across heterogeneous roles. Extensive experiments demonstrate that Character-R1 significantly outperforms existing methods in knowledge, memory and others.

Related papers

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z)
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z)
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks [6.881699020319577]
We propose Direct Reasoning Optimization (DRO), a reinforcement learning framework for fine-tuning Large Language Models (LLMs)<n>DRO is guided by a new reward signal: the Reasoning Reflection Reward (R3)<n>DRO consistently outperforms strong baselines while remaining broadly applicable across both open-ended and structured domains.
arXiv Detail & Related papers (2025-06-16T10:43:38Z)
RAG-Zeval: Towards Robust and Interpretable Evaluation on RAG Responses through End-to-End Rule-Guided Reasoning [64.46921169261852]
RAG-Zeval is a novel end-to-end framework that formulates faithfulness and correctness evaluation as a rule-guided reasoning task.<n>Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments.<n>Experiments demonstrate RAG-Zeval's superior performance, achieving the strongest correlation with human judgments.
arXiv Detail & Related papers (2025-05-28T14:55:33Z)
Reward-Aware Proto-Representations in Reinforcement Learning [6.855996110012974]
In recent years, the successor representation (SR) has attracted increasing attention in reinforcement learning (RL)<n>In this paper, we discuss a similar representation that also takes into account the reward dynamics of the problem.<n>Our results show that, compared to the SR, the DR gives rise to qualitatively different, reward-aware behaviour and quantitatively better performance in several settings.
arXiv Detail & Related papers (2025-05-22T04:33:00Z)
RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward [7.9399136525335585]
RAIDEN-R1 is a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR)<n>We construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration.<n> Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority.
arXiv Detail & Related papers (2025-05-15T12:22:10Z)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z)
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning [22.825527641316192]
This paper introduces ARR, an intuitive, effective, and general QA solving method.<n>It explicitly incorporates three key steps: analyzing the intent of the question, retrieving relevant information, and reasoning step by step.<n>It is the first to introduce intent analysis in QA, which plays a vital role in ARR.
arXiv Detail & Related papers (2025-02-07T06:30:33Z)
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
We propose an Iterative utiliTy judgm fraEntMework (ITEM) to promote each step in Retrieval-Augmented Generation (RAG)<n>RAG's three core components -- relevance ranking derived from retrieval models, utility judgments, and answer generation -- align with Schutz's philosophical system of relevances.<n> Experimental results demonstrate significant improvements of ITEM in utility judgments, ranking, and answer generation upon representative baselines.
arXiv Detail & Related papers (2024-06-17T07:52:42Z)
ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.