Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
- URL: http://arxiv.org/abs/2602.10458v1
- Date: Wed, 11 Feb 2026 02:56:04 GMT
- Title: Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
- Authors: Yansong Qu, Zihao Sheng, Zilin Huang, Jiancong Chen, Yuhao Luo, Tianyi Wang, Yiheng Feng, Samuel Labi, Sikai Chen,
- Abstract summary: Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD)<n>Found-RL is a platform tailored to efficiently enhance RL for AD using foundation models.<n>A core innovation is the asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop.
- Score: 15.275134927543611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-Language Models (VLMs), can mitigate this by offering rich, context-aware knowledge, yet their high inference latency hinders deployment in high-frequency RL training loops. To bridge this gap, we present Found-RL, a platform tailored to efficiently enhance RL for AD using foundation models. A core innovation is the asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop, effectively resolving latency bottlenecks to support real-time learning. We introduce diverse supervision mechanisms: Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Additionally, we adopt high-throughput CLIP for dense reward shaping. We address CLIP's dynamic blindness via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL provides an end-to-end pipeline for fine-tuned VLM integration and shows that a lightweight RL model can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (approx. 500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.
Related papers
- Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving [22.3805998088591]
DACER-F is a flow matching algorithm for generative policies in autonomous driving systems.<n>It achieves a score of 775.8 on the humanoid-stand task and surpassing prior methods.
arXiv Detail & Related papers (2026-03-03T05:35:53Z) - Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates [53.3717573880076]
We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates.<n>JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly.<n>Experiments on WebArena and Jericho demonstrate that JitRL establishes a new state-of-the-art among training-free methods.
arXiv Detail & Related papers (2026-01-26T14:16:51Z) - Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z) - Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch [63.40752011615843]
Training tool-augmented language models has emerged as a promising approach to enhancing their capabilities for complex tasks.<n>We propose a dynamic generalization-guided reward design for rule-based reinforcement learning.<n>We show that our models achieve over 7% performance improvement compared to both SFT and RL-with-SFT models.
arXiv Detail & Related papers (2025-11-02T16:33:45Z) - Expressive Value Learning for Scalable Offline Reinforcement Learning [9.946269411850064]
Reinforcement learning (RL) is a powerful paradigm for learning to make sequences of decisions.<n> offline RL offers a promising avenue by training agents on large, diverse datasets.<n>We introduce Expressive Value Learning for Offline Reinforcement Learning (EVOR), a scalable offline RL approach that integrates both expressive policies and expressive value functions.
arXiv Detail & Related papers (2025-10-09T13:42:20Z) - SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning [81.7764584515496]
Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robotic manipulation.<n>These models face two fundamental challenges: scarcity and high cost of large-scale human-operated robotic trajectories.<n>We introduce SimpleVLA-RL, an efficient reinforcement learning framework tailored for VLA models.
arXiv Detail & Related papers (2025-09-11T17:59:17Z) - Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models [29.090093552573766]
We propose an offline RL post-training objective for Vision-Language-Action (VLA) flow models.<n>We then induce an efficient and feasible offline RL fine-tuning algorithm -- Adaptive Reinforced Flow Matching (ARFM)<n>ARFM exhibits excellent generalization, robustness, few-shot learning, and continuous learning performance.
arXiv Detail & Related papers (2025-09-04T09:48:43Z) - Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance [46.06527859746679]
We introduce Reinforcement Learning Guidance (RLG), an inference-time method that adapts Dejin-Free Guidance (CFG)<n>RLG consistently improves the performance of RL fine-tuned models across various, RL algorithms, and downstream tasks, including human preferences, compositional control, compress, and text rendering.<n>Our approach provides a practical and theoretically sound solution for enhancing and controlling diffusion model alignment inference.
arXiv Detail & Related papers (2025-08-28T17:18:31Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing [5.62872273155603]
Large Language Models (LLMs) structure unorganized network feedback into meaningful latent representations.<n>In O-RAN slicing, concepts like SNR, power levels and throughput are semantically related.<n>We introduce a contextualization-based adaptation method that integrates learnable prompts into an LLM-augmented DRL framework.
arXiv Detail & Related papers (2025-05-31T14:12:56Z) - VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving [1.3107174618549584]
Reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community.<n>Traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability.<n>We propose textbfVLM-RL, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals.
arXiv Detail & Related papers (2024-12-20T04:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.