Related papers: LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

URL: http://arxiv.org/abs/2506.18841v1
Date: Mon, 23 Jun 2025 16:59:02 GMT
Title: LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Authors: Yuhao Wu, Yushi Bai, Zhiqiang Hu, Roy Ka-Wei Lee, Juanzi Li,
Abstract summary: We propose an incentivization-based approach that leverages reinforcement learning (RL) to foster the emergence of ultra-long, high-quality text generation capabilities.<n>Our LongWriter-Zero model, trained from Qwen2.5-32B, consistently outperforms traditional SFT methods on long-form writing tasks.
Score: 34.723917246316205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ultra-long generation by large language models (LLMs) is a widely demanded scenario, yet it remains a significant challenge due to their maximum generation length limit and overall quality degradation as sequence length increases. Previous approaches, exemplified by LongWriter, typically rely on ''teaching'', which involves supervised fine-tuning (SFT) on synthetic long-form outputs. However, this strategy heavily depends on synthetic SFT data, which is difficult and costly to construct, often lacks coherence and consistency, and tends to be overly artificial and structurally monotonous. In this work, we propose an incentivization-based approach that, starting entirely from scratch and without relying on any annotated or synthetic data, leverages reinforcement learning (RL) to foster the emergence of ultra-long, high-quality text generation capabilities in LLMs. We perform RL training starting from a base model, similar to R1-Zero, guiding it to engage in reasoning that facilitates planning and refinement during the writing process. To support this, we employ specialized reward models that steer the LLM towards improved length control, writing quality, and structural formatting. Experimental evaluations show that our LongWriter-Zero model, trained from Qwen2.5-32B, consistently outperforms traditional SFT methods on long-form writing tasks, achieving state-of-the-art results across all metrics on WritingBench and Arena-Write, and even surpassing 100B+ models such as DeepSeek R1 and Qwen3-235B. We open-source our data and model checkpoints under https://huggingface.co/THU-KEG/LongWriter-Zero-32B

Related papers

Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning [55.41828729623907]
We present Writing-RL: an Adaptive Curriculum Reinforcement Learning framework to advance long-form writing capabilities beyond supervised fine-tuning.<n>The framework consists of three key components: Margin-aware Data Selection strategy that prioritizes samples with high learning potential, Pairwise Comparison Reward mechanism that provides discriminative learning signals, and Dynamic Reference Scheduling approach.
arXiv Detail & Related papers (2025-06-06T05:40:39Z)
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models [34.723917246316205]
SuperWriter-Agent is a framework designed to enhance the quality and consistency of long-form text generation.<n>Based on this framework, we construct a supervised fine-tuning dataset to train a 7B SuperWriter-LM.<n> Empirical results across diverse benchmarks demonstrate that SuperWriter-LM achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-04T17:27:42Z)
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels [3.537369004801589]
We release the Too Long, Didn't Model benchmark.<n>It tests a model's ability to report plot summary, storyworld configuration, and elapsed narrative time.<n>We find that none of seven tested frontier LLMs retain stable understanding beyond 64k tokens.
arXiv Detail & Related papers (2025-05-20T21:21:09Z)
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information [76.26257306813899]
Long-form generation is crucial for academic writing papers and repo-level code generation.<n>Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts.<n>We propose enhancing long-form generation by incorporating process supervision.
arXiv Detail & Related papers (2025-02-04T08:25:17Z)
Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen. It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z)
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs [57.23637303451716]
Long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. We introduce AgentWrite, an agent-based pipeline that decomposes ultra-long generation tasks into subtasks. We construct LongWriter-6k, a dataset containing 6,000 SFT data with output lengths ranging from 2k to 32k words.
arXiv Detail & Related papers (2024-08-13T17:46:12Z)
Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models [12.656574142412484]
We make an attempt to understand the correlation between supervised fine-tuning and reinforcement learning.<n>We find that both atomic and synthetic functions are indispensable for SFT's generalization.
arXiv Detail & Related papers (2024-06-14T03:39:01Z)
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens. We develop two novel methods for creating synthetic data. LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.