Related papers: InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling

InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling

URL: http://arxiv.org/abs/2508.08636v1
Date: Tue, 12 Aug 2025 05:00:00 GMT
Title: InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
Authors: Peiji Li, Jiasheng Ye, Yongkang Chen, Yichuan Ma, Zijie Yu, Kedi Chen, Ganqu Cui, Haozhan Li, Jiacheng Chen, Chengqi Lyu, Wenwei Zhang, Linyang Li, Qipeng Guo, Dahua Lin, Bowen Zhou, Kai Chen,
Abstract summary: Large language models (LLMs) have revolutionized artificial intelligence by enabling complex reasoning capabilities.<n>To address this gap, we present InternBootcamp, an open-source framework comprising 1000+ domain-diverse task environments.
Score: 71.37579508777843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have revolutionized artificial intelligence by enabling complex reasoning capabilities. While recent advancements in reinforcement learning (RL) have primarily focused on domain-specific reasoning tasks (e.g., mathematics or code generation), real-world reasoning scenarios often require models to handle diverse and complex environments that narrow-domain benchmarks cannot fully capture. To address this gap, we present InternBootcamp, an open-source framework comprising 1000+ domain-diverse task environments specifically designed for LLM reasoning research. Our codebase offers two key functionalities: (1) automated generation of unlimited training/testing cases with configurable difficulty levels, and (2) integrated verification modules for objective response evaluation. These features make InternBootcamp fundamental infrastructure for RL-based model optimization, synthetic data generation, and model evaluation. Although manually developing such a framework with enormous task coverage is extremely cumbersome, we accelerate the development procedure through an automated agent workflow supplemented by manual validation protocols, which enables the task scope to expand rapidly. % With these bootcamps, we further establish Bootcamp-EVAL, an automatically generated benchmark for comprehensive performance assessment. Evaluation reveals that frontier models still underperform in many reasoning tasks, while training with InternBootcamp provides an effective way to significantly improve performance, leading to our 32B model that achieves state-of-the-art results on Bootcamp-EVAL and excels on other established benchmarks. In particular, we validate that consistent performance gains come from including more training tasks, namely \textbf{task scaling}, over two orders of magnitude, offering a promising route towards capable reasoning generalist.

Related papers

LongCat-Flash-Thinking-2601 Technical Report [134.89732115690705]
LongCat-Flash-Thinking-2601 is an open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability.<n>LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks.
arXiv Detail & Related papers (2026-01-23T13:20:09Z)
Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs [26.165537937650413]
Training agents to operate under strict constraints during deployment presents significant challenges.<n>We propose a curriculum learning strategy that gradually tightens constraints during training, enabling the agent to incrementally master the deployment requirements.
arXiv Detail & Related papers (2025-11-04T16:14:56Z)
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training [48.20667772172573]
APTBench is a framework that converts real-world agent tasks and successful trajectories into multiple-choice or text completion questions.<n>It focuses on core agentic abilities, e.g., planning and action, and covers key agent scenarios, software engineering and deep research.<n>Compared to existing general-purpose benchmarks, APTBench offers a more predictive signal of a model's downstream performance as an agent.
arXiv Detail & Related papers (2025-10-28T13:11:22Z)
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z)
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training [23.092267430951484]
ORBIT is an openended training framework specifically designed for highstakes medical dialogue.<n>Our analysis confirms that rubric-driven RLsters consistent performance gains across diverse scenarios.
arXiv Detail & Related papers (2025-10-17T17:51:28Z)
Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards [50.21528417884747]
We introduce Omni-Thinker, a unified reinforcement learning framework that enhances large language models (LLMs) performance across diverse tasks.<n>Our approach enables consistent optimization across task types and scales RL-based training to subjective domains.<n> Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging.
arXiv Detail & Related papers (2025-07-20T01:50:16Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
Model-Based Transfer Learning for Contextual Reinforcement Learning [5.5597941107270215]
We introduce Model-Based Transfer Learning to solve contextual RL problems.<n>We show theoretically that the method exhibits sublinear regret in the number of training tasks.<n>We experimentally validate our methods using urban traffic and standard continuous control benchmarks.
arXiv Detail & Related papers (2024-08-08T14:46:01Z)
Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS) LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments. We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective. We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.