CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
- URL: http://arxiv.org/abs/2510.12560v1
- Date: Tue, 14 Oct 2025 14:21:52 GMT
- Title: CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
- Authors: Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong,
- Abstract summary: CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing conflicts.<n> Experiments on the nuScenes dataset show an 18% reduction in collision rate to baselines, along with stronger generalization and improved performance on long-tail scenarios.
- Score: 8.578450183245037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
Related papers
- Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models [71.9060068259379]
We propose cascaded domain-wise reinforcement learning to build general-purpose reasoning models.<n>Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6 Pro and silver-medal performance in the 2025 International Olympiad in Informatics (IOI)
arXiv Detail & Related papers (2025-12-15T18:02:35Z) - On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models [73.10315509190623]
Recent reinforcement learning techniques have yielded impressive reasoning improvements in language models.<n>It remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during pre-training.<n>We develop a fully controlled experimental framework that isolates the causal contributions of pre-training, mid-training, and RL-based post-training.
arXiv Detail & Related papers (2025-12-08T18:12:10Z) - Scaling Agent Learning via Experience Synthesis [100.42712232390532]
Reinforcement learning can empower autonomous agents by enabling self-improvement through interaction.<n>But its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity.<n>We introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind.
arXiv Detail & Related papers (2025-11-05T18:58:48Z) - Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning [29.778703252962092]
Reinforcement Learning (RL) has emerged as a powerful paradigm for advancing Large Language Models (LLMs)<n>We develop a novel test-time reward mechanism that operates without external supervision.
arXiv Detail & Related papers (2025-10-20T07:53:51Z) - On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting [91.38734024438357]
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs)<n>Existing approaches that integrate SFT and RL often face the risk of disrupting established response patterns and inducing overfitting to expert data.<n>We propose CHORD, a framework for Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting.
arXiv Detail & Related papers (2025-08-15T11:20:03Z) - Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning [87.7836502955847]
We propose a novel self-rewarding reinforcement learning framework to enhance Large Language Model (LLM) reasoning.<n>Our key insight is that correct responses often exhibit consistent trajectory patterns in terms of model likelihood.<n>We introduce CoVo, an intrinsic reward mechanism that integrates Consistency and Volatility via a robust vector-space aggregation strategy.
arXiv Detail & Related papers (2025-06-10T12:40:39Z) - Vintix: Action Model via In-Context Reinforcement Learning [78.11130335098936]
We present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning.<n>Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models.
arXiv Detail & Related papers (2025-01-31T18:57:08Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.