KLong: Training LLM Agent for Extremely Long-horizon Tasks
- URL: http://arxiv.org/abs/2602.17547v1
- Date: Thu, 19 Feb 2026 17:01:08 GMT
- Title: KLong: Training LLM Agent for Extremely Long-horizon Tasks
- Authors: Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi,
- Abstract summary: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks.<n>We first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training.<n> Notably, our proposed KLong (106B) surpasses Kimi K2 Thinking (1T) by 11.28% on PaperBench.
- Score: 58.68395081637727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training into multiple stages with progressively extended timeouts. Experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our proposed KLong (106B) surpasses Kimi K2 Thinking (1T) by 11.28% on PaperBench, and the performance improvement generalizes to other coding benchmarks like SWE-bench Verified and MLE-bench.
Related papers
- Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better [24.03797089794804]
We propose a Late-to-Early Training (LET) paradigm that enables Large Language Models to learn later knowledge in earlier steps and earlier layers.<n>We identify two key mechanisms that drive LET's effectiveness: late-to-early-step learning and late-to-early-layer learning.<n>Our method achieves up to 1.6$times$ speedup with nearly 5% improvement in downstream task accuracy compared to standard training.
arXiv Detail & Related papers (2026-02-05T07:19:34Z) - QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management [81.24674400474989]
QwenLong-L1.5 is a model that provides superior longcontext reasoning capabilities through systematic post-training innovations.<n>We develop a systematic framework that generates challenging reasoning tasks requiring multihop sequences over globally distributed evidence.<n>We develop a memory management framework with multi-exploit fusion RL training that seamlessly integrates single-pass reasoning with iterative memory-based processing for tasks exceeding 4M tokens.
arXiv Detail & Related papers (2025-12-15T04:11:11Z) - Beat the long tail: Distribution-Aware Speculative Decoding for RL Training [75.75462952580796]
We propose a Distribution Aware Speculative decoding framework that accelerates RL rollouts without altering model outputs.<n>Experiments on math and code reasoning tasks show that DAS reduces rollout time up to 50% while preserving identical training curves.
arXiv Detail & Related papers (2025-11-17T19:02:12Z) - h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning [22.930073904843212]
Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase.<n>Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision.<n>We introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data.
arXiv Detail & Related papers (2025-10-08T17:58:41Z) - LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning [34.723917246316205]
We propose an incentivization-based approach that leverages reinforcement learning (RL) to foster the emergence of ultra-long, high-quality text generation capabilities.<n>Our LongWriter-Zero model, trained from Qwen2.5-32B, consistently outperforms traditional SFT methods on long-form writing tasks.
arXiv Detail & Related papers (2025-06-23T16:59:02Z) - How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.<n>We find that code repositories and books are excellent sources of long data, but it is crucial to combine them with high-quality short-context data.<n>Our final model, ProLong-8B, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens.
We develop two novel methods for creating synthetic data.
LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.