DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
- URL: http://arxiv.org/abs/2601.09609v1
- Date: Wed, 14 Jan 2026 16:30:20 GMT
- Title: DPWriter: Reinforcement Learning with Diverse Planning Branching for Creative Writing
- Authors: Qian Cao, Yahui Liu, Wei Bi, Yi Zhao, Ruihua Song, Xiting Wang, Ruiming Tang, Guorui Zhou, Han Li,
- Abstract summary: Reinforcement learning (RL)-based enhancement of large language models (LLMs) often leads to reduced output diversity.<n>This paper proposes an RL framework structured around a semi-structured long Chain-of-Thought (CoT)<n>We introduce a Diverse Planning Branching method that strategically introduces divergence at the planning phase based on diversity variation.
- Score: 78.70918589095639
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning (RL)-based enhancement of large language models (LLMs) often leads to reduced output diversity, undermining their utility in open-ended tasks like creative writing. Current methods lack explicit mechanisms for guiding diverse exploration and instead prioritize optimization efficiency and performance over diversity. This paper proposes an RL framework structured around a semi-structured long Chain-of-Thought (CoT), in which the generation process is decomposed into explicitly planned intermediate steps. We introduce a Diverse Planning Branching method that strategically introduces divergence at the planning phase based on diversity variation, alongside a group-aware diversity reward to encourage distinct trajectories. Experimental results on creative writing benchmarks demonstrate that our approach significantly improves output diversity without compromising generation quality, consistently outperforming existing baselines.
Related papers
- SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning [50.93295951454092]
We introduce a set level diversity objective defined over sampled trajectories using kernelized similarity.<n>Our approach derives a leave-one-out marginal contribution for each sampled trajectory and integrates this objective as a plug-in advantage shaping term for policy optimization.<n>Experiments across a range of model scales demonstrate the effectiveness of our proposed algorithm, consistently outperforming strong baselines in both Pass@1 and Pass@K across various benchmarks.
arXiv Detail & Related papers (2026-02-01T07:13:20Z) - DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z) - A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization [51.27959658504722]
Multi-task learning offers a principled framework to train these tasks jointly through shared representations.<n>Existing multi-task optimization strategies are primarily guided by training dynamics and often generalize poorly in volatile bidding environments.<n>We present Validation-Aligned Multi-task Optimization (VAMO), which adaptively assigns task weights based on the alignment between per-task training gradients and a held-out validation gradient.
arXiv Detail & Related papers (2025-10-09T03:59:51Z) - Post-training Large Language Models for Diverse High-Quality Responses [32.92680825196664]
Reinforcement learning (RL) has emerged as a popular method for post-training large language models (LLMs)<n>We propose a novel training method named DQO (Diversity Quality Optimization) based on determinantal point processes (DPPs)<n>Our approach samples and embeds a group of responses for each prompt, then uses the determinant of a kernel-based similarity matrix to measure diversity as the volume spanned by the embeddings of these responses.
arXiv Detail & Related papers (2025-09-05T03:47:06Z) - Jointly Reinforcing Diversity and Quality in Language Model Generations [64.72289248044514]
Post-training of Large Language Models (LMs) often prioritizes accuracy and helpfulness at the expense of diversity.<n>We address this challenge with Diversity-Aware Reinforcement Learning (DARLING), a framework that jointly optimize for response quality and semantic diversity.
arXiv Detail & Related papers (2025-09-02T17:38:47Z) - Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models [0.0]
This paper investigates the diversity gap'' for a writing prompt narrative generation task.<n>Results show significant decreases in diversity due to instruction-tuning.<n>We present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity.
arXiv Detail & Related papers (2025-07-28T16:04:25Z) - Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z) - Modifying Large Language Model Post-Training for Diverse Creative Writing [12.872333448726595]
In creative writing generation, we investigate post-training approaches to promote both output diversity and quality.<n>Our core idea is to include deviation in the training objective to facilitate learning from rare high-quality instances.<n>Our best model with 8B parameters could achieve on-par diversity as a human-created dataset while having output quality similar to the best instruction-tuned models.
arXiv Detail & Related papers (2025-03-21T13:21:45Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.