LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards
- URL: http://arxiv.org/abs/2602.05758v1
- Date: Thu, 05 Feb 2026 15:26:47 GMT
- Title: LongR: Unleashing Long-Context Reasoning via Reinforcement Learning with Dense Utility Rewards
- Authors: Bowen Ping, Zijun Chen, Yiyao Yu, Tingfeng Hui, Junchi Yan, Baobao Chang,
- Abstract summary: LongR is a framework that enhances long-context performance by integrating a dynamic "Think-and-Read" mechanism.<n>LongR consistently enhances performance across diverse RL algorithms.
- Score: 57.993003392037174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning has emerged as a key driver for LLM reasoning. This capability is equally pivotal in long-context scenarios--such as long-dialogue understanding and structured data analysis, where the challenge extends beyond consuming tokens to performing rigorous deduction. While existing efforts focus on data synthesis or architectural changes, recent work points out that relying solely on sparse, outcome-only rewards yields limited gains, as such coarse signals are often insufficient to effectively guide the complex long-context reasoning. To address this, we propose LongR, a unified framework that enhances long-context performance by integrating a dynamic "Think-and-Read" mechanism, which interleaves reasoning with document consultation, with a contextual density reward based on relative information gain to quantify the utility of the relevant documents. Empirically, LongR achieves a 9% gain on LongBench v2 and consistent improvements on RULER and InfiniteBench, demonstrating robust efficiency in navigating extensive contexts. Furthermore, LongR consistently enhances performance across diverse RL algorithms (e.g., DAPO, GSPO). Finally, we conduct in-depth analyses to investigate the impact of reasoning chain length on efficiency and the model's robustness against distractors.
Related papers
- Document Reconstruction Unlocks Scalable Long-Context RLVR [60.74632963522131]
Reinforcement Learning with Verifiable Rewards(RLVR) has become a prominent paradigm to enhance the capabilities (i.e. long-context) of Large Language Models(LLMs)<n>We investigate unsupervised approaches to enhance the long-context capabilities of LLMs, eliminating the need for heavy human annotations or teacher models' supervision.<n>We validate the effectiveness of our method on two widely used benchmarks, RULER and LongBenchv2.
arXiv Detail & Related papers (2026-02-09T03:23:23Z) - Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping [38.280470586624496]
Long-context reasoning requires both precise grounding and robust long-range reasoning.<n>We propose DeepReasonQA, a KG-driven framework that constructs high-difficulty, multi-hop long-context QA pairs with inherent reasoning chains.<n>We show that our approach substantially outperforms RLVR baselines and matches frontier LLMs while using far fewer parameters.
arXiv Detail & Related papers (2026-01-18T16:10:04Z) - REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z) - Joint Enhancement of Relational Reasoning for Long-Context LLMs [39.679627202160425]
Large language models (LLMs) struggle with long contexts due to memory limitations and their inability to tackle complex and long-context tasks.<n>We propose textbfJERR, a novel framework designed to enhance long-context comprehension via graph-based reasoning.
arXiv Detail & Related papers (2025-08-28T01:54:47Z) - QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning [80.26953590563232]
We formalize the paradigm of long-context reasoning RL, and identify key challenges in suboptimal training efficiency and unstable optimization process.<n>We propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling.<n> Experiments on seven long-context document question-answering benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B.
arXiv Detail & Related papers (2025-05-23T09:31:55Z) - Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [28.421675216147374]
LongRefiner is an efficient plug-and-play refiner that leverages the inherent structural characteristics of long documents.<n>LongRefiner achieves competitive performance in various scenarios while using 10x fewer computational costs and latency compared to the best baseline.
arXiv Detail & Related papers (2025-05-15T15:34:15Z) - PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention [73.26995918610669]
Large Language Models (LLMs) face efficiency bottlenecks due to the quadratic complexity of the attention mechanism when processing long contexts.<n>We introduce PowerAttention, a novel sparse attention design that facilitates effective and complete context extension.<n>Experiments demonstrate that PowerAttention outperforms existing static sparse attention methods by $5sim 40%$.
arXiv Detail & Related papers (2025-03-05T15:24:11Z) - RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding [7.785459677641105]
Long-context large language models (LLMs) offer a promising alternative to traditional retrieval-augmented generation (RAG)<n>We introduce Retrieval-Augmented Speculative Decoding (RAPID), which leverages RAG for both accelerating and enhancing generation quality in long-context inference.<n>Our approach enables a new paradigm where same-scale or even larger LLMs can serve as RAG drafters while maintaining computational efficiency.
arXiv Detail & Related papers (2025-02-27T17:59:36Z) - LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data [19.79929012055293]
LongFaith is a novel pipeline for synthesizing faithful long-context reasoning instruction datasets.<n>By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains.
arXiv Detail & Related papers (2025-02-18T06:40:23Z) - SEAL: Scaling to Emphasize Attention for Long-Context Retrieval [8.805524738976075]
We introduce a novel approach called Scaling to Emphasize Attention for Long-context retrieval (SEAL)<n>We observe that specific attention heads are closely tied to long-context retrieval, showing positive or negative correlation with retrieval scores.<n>We propose a learning-based mechanism that leverages generated data to emphasize these heads.
arXiv Detail & Related papers (2025-01-25T14:09:39Z) - LongReward: Improving Long-context Large Language Models with AI Feedback [54.3321542678909]
LongReward is a novel method that provides rewards for long-context model responses from four human-valued dimensions.
Our experiments indicate that LongReward not only significantly improves models' long-context performance but also enhances their ability to follow short instructions.
arXiv Detail & Related papers (2024-10-28T17:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.