ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
- URL: http://arxiv.org/abs/2505.23923v1
- Date: Thu, 29 May 2025 18:15:18 GMT
- Title: ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
- Authors: Feiteng Fang, Ting-En Lin, Yuchuan Wu, Xiong Liu, Xiang Huang, Dingwei Chen, Jing Ye, Haonan Zhang, Liang Zhu, Hamid Alinejad-Rokny, Min Yang, Fei Huang, Yongbin Li,
- Abstract summary: Role-Playing Language Agents (RPLAs) aim to simulate characters for realistic and engaging human-computer interactions.<n>We propose ChARM, a Character-based Act-adaptive Reward Model.<n>We introduce RoleplayPref, the first large-scale preference dataset specifically for RPLAs.
- Score: 60.325553329946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Role-Playing Language Agents (RPLAs) aim to simulate characters for realistic and engaging human-computer interactions. However, traditional reward models often struggle with scalability and adapting to subjective conversational preferences. We propose ChARM, a Character-based Act-adaptive Reward Model, addressing these challenges through two innovations: (1) an act-adaptive margin that significantly enhances learning efficiency and generalizability, and (2) a self-evolution mechanism leveraging large-scale unlabeled data to improve training coverage. Additionally, we introduce RoleplayPref, the first large-scale preference dataset specifically for RPLAs, featuring 1,108 characters, 13 subcategories, and 16,888 bilingual dialogues, alongside RoleplayEval, a dedicated evaluation benchmark. Experimental results show a 13% improvement over the conventional Bradley-Terry model in preference rankings. Furthermore, applying ChARM-generated rewards to preference learning techniques (e.g., direct preference optimization) achieves state-of-the-art results on CharacterEval and RoleplayEval. Code and dataset are available at https://github.com/calubkk/ChARM.
Related papers
- FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions [14.26977110112456]
Preference-based reinforcement learning is a suitable approach for style adaptation of pre-trained robotic behavior.<n>Recent adaptation approaches suffer from catastrophic reward forgetting (CRF), where the updated reward model overfits to the new preferences.<n>We show that our method can efficiently and effectively adjust robotic behavior to human preferences across simulation benchmark tasks and multiple real-world robotic tasks.
arXiv Detail & Related papers (2025-04-14T09:04:14Z) - PILAF: Optimal Human Preference Sampling for Reward Modeling [14.336058926701432]
We propose Policy-Interpolated Learning for Aligned Feedback (PILAF), a novel response sampling strategy for preference labeling.<n>PILAF explicitly aligns preference learning with maximizing the underlying oracle reward.
arXiv Detail & Related papers (2025-02-06T18:09:00Z) - OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas [65.83634577897564]
This study explores a large-scale data synthesis approach to equip large language models with character generalization capabilities.<n>We begin by synthesizing large-scale character profiles using personas from Persona Hub.<n>We then explore two strategies: response rewriting and response generation, to create character-aligned instructional responses.
arXiv Detail & Related papers (2025-01-26T07:07:01Z) - Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment [51.14207112118503]
We introduce preference embedding, an approach that embeds responses into a latent space to capture preferences efficiently.<n>We also propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback.<n>Our method may enhance the alignment of foundation models with nuanced human values.
arXiv Detail & Related papers (2024-10-03T04:22:55Z) - Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning [12.742158403867002]
Reinforcement Learning from Human Feedback is a powerful paradigm for aligning foundation models to human values and preferences.
Current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population.
We develop a class of multimodal RLHF methods to address the need for pluralistic alignment.
arXiv Detail & Related papers (2024-08-19T15:18:30Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - TransAct: Transformer-based Realtime User Action Model for
Recommendation at Pinterest [17.247452803197362]
This paper presents Pinterest's ranking architecture for Homefeed.
We propose TransAct, a sequential model that extracts users' short-term preferences from their realtime activities.
We describe the results of ablation studies, the challenges we faced during productionization, and the outcome of an online A/B experiment.
arXiv Detail & Related papers (2023-05-31T23:45:29Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Leveraging Historical Interaction Data for Improving Conversational
Recommender System [105.90963882850265]
We propose a novel pre-training approach to integrate item- and attribute-based preference sequence.
Experiment results on two real-world datasets have demonstrated the effectiveness of our approach.
arXiv Detail & Related papers (2020-08-19T03:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.