Diversity-Enhanced Reasoning for Subjective Questions
- URL: http://arxiv.org/abs/2507.20187v1
- Date: Sun, 27 Jul 2025 09:07:42 GMT
- Title: Diversity-Enhanced Reasoning for Subjective Questions
- Authors: Yumeng Wang, Zhiyuan Fan, Jiayu Liu, Yi R. Fung,
- Abstract summary: We propose MultiRole-R1, a diversity-enhanced framework with multiple role perspectives, to improve the accuracy and diversity in subjective reasoning tasks.<n>With specially designed reward functions, we successfully promote perspective diversity and lexical diversity.<n>Our experiment on six benchmarks demonstrates MultiRole-R1's effectiveness and generalizability in enhancing both subjective and objective reasoning.
- Score: 6.898139210272096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large reasoning models (LRM) with long chain-of-thought (CoT) capabilities have shown strong performance on objective tasks, such as math reasoning and coding. However, their effectiveness on subjective questions that may have different responses from different perspectives is still limited by a tendency towards homogeneous reasoning, introduced by the reliance on a single ground truth in supervised fine-tuning and verifiable reward in reinforcement learning. Motivated by the finding that increasing role perspectives consistently improves performance, we propose MultiRole-R1, a diversity-enhanced framework with multiple role perspectives, to improve the accuracy and diversity in subjective reasoning tasks. MultiRole-R1 features an unsupervised data construction pipeline that generates reasoning chains that incorporate diverse role perspectives. We further employ reinforcement learning via Group Relative Policy Optimization (GRPO) with reward shaping, by taking diversity as a reward signal in addition to the verifiable reward. With specially designed reward functions, we successfully promote perspective diversity and lexical diversity, uncovering a positive relation between reasoning diversity and accuracy. Our experiment on six benchmarks demonstrates MultiRole-R1's effectiveness and generalizability in enhancing both subjective and objective reasoning, showcasing the potential of diversity-enhanced training in LRMs.
Related papers
- VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning [69.44871115752055]
We propose an advanced multimodal reasoning model trained via a novel Progressive Curriculum Reinforcement Learning (PCuRL) framework.<n>PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts.<n>The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism, dynamically adjusting training difficulty across successive RL training stages; and (2) a dynamic length reward mechanism, which encourages the model to adaptively regulate its reasoning path length according to task complexity.
arXiv Detail & Related papers (2025-07-30T12:23:21Z) - CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z) - PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning [50.21619363035618]
We propose a general reinforcement learning approach PeRL tailored for interleaved multimodal tasks.<n>We introduce permutation of image sequences to simulate varied positional relationships to explore more spatial and positional diversity.<n>Our experiments confirm that PeRL trained model consistently surpasses R1-related and interleaved VLM baselines by a large margin.
arXiv Detail & Related papers (2025-06-17T18:25:56Z) - Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning [87.7836502955847]
We propose a novel self-rewarding reinforcement learning framework to enhance Large Language Model (LLM) reasoning.<n>Our key insight is that correct responses often exhibit consistent trajectory patterns in terms of model likelihood.<n>We introduce CoVo, an intrinsic reward mechanism that integrates Consistency and Volatility via a robust vector-space aggregation strategy.
arXiv Detail & Related papers (2025-06-10T12:40:39Z) - Diversity-Aware Policy Optimization for Large Language Model Reasoning [30.460540027658173]
We investigate the impact of diversity in RL-based training for large language models.<n>We propose a novel diversity-aware policy optimization method.<n>Our method achieves a 3.5 percent average improvement across four mathematical reasoning benchmarks.
arXiv Detail & Related papers (2025-05-29T13:27:44Z) - RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward [7.9399136525335585]
RAIDEN-R1 is a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR)<n>We construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration.<n> Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority.
arXiv Detail & Related papers (2025-05-15T12:22:10Z) - Curiosity-Driven Reinforcement Learning from Human Feedback [56.45486828254951]
Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models with human preferences, but often at the cost of reduced output diversity.<n>We introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards.<n>We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following.
arXiv Detail & Related papers (2025-01-20T12:51:40Z) - Demonstration Selection for In-Context Learning via Reinforcement Learning [16.103533806505403]
Relevance-Diversity Enhanced Selection (RDES) is an innovative approach to optimize the selection of diverse reference demonstrations.<n>RDES employs frameworks like Q-learning and a PPO-based variant to dynamically identify demonstrations that maximize diversity.<n>We demonstrate that RDES significantly enhances performance compared to ten established baselines.
arXiv Detail & Related papers (2024-12-05T08:33:52Z) - Evolution of Thought: Diverse and High-Quality Reasoning via Multi-Objective Optimization [14.346638764967357]
Multi-modal large language models (MLLMs) are increasingly applied to complex reasoning tasks.<n>We propose Evolution of Thought (EoT) to foster both high-quality and diverse reasoning paths.<n>We show EoT achieves superior reasoning performance and efficiency compared to other competitive baselines.
arXiv Detail & Related papers (2024-11-24T14:59:30Z) - DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints [68.82294911302579]
We introduce DiveR-CT, which relaxes conventional constraints on the objective and semantic reward, granting greater freedom for the policy to enhance diversity.<n>Our experiments demonstrate DiveR-CT's marked superiority over baselines by 1) generating data that perform better in various diversity metrics across different attack success rate levels, 2) better-enhancing resiliency in blue team models through safety tuning based on collected data, 3) allowing dynamic control of objective weights for reliable and controllable attack success rates, and 4) reducing susceptibility to reward overoptimization.
arXiv Detail & Related papers (2024-05-29T12:12:09Z) - Choosing the Best of Both Worlds: Diverse and Novel Recommendations
through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting.
SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations.
Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.