VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization   via Vision-Language Models for Autonomous Driving
        - URL: http://arxiv.org/abs/2503.23046v1
 - Date: Sat, 29 Mar 2025 11:40:34 GMT
 - Title: VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization   via Vision-Language Models for Autonomous Driving
 - Authors: Haibo Hu, Jiacheng Zuo, Yang Lou, Yufei Cui, Jianping Wang, Nan Guan, Jin Wang, Yung-Hui Li, Chun Jason Xue, 
 - Abstract summary: We propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets.<n>VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases.
 - Score: 20.136096264189156
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   With the widespread adoption and deployment of autonomous driving, handling complex environments has become an unavoidable challenge. Due to the scarcity and diversity of extreme scenario datasets, current autonomous driving models struggle to effectively manage corner cases. This limitation poses a significant safety risk, according to the National Highway Traffic Safety Administration (NHTSA), autonomous vehicle systems have been involved in hundreds of reported crashes annually in the United States, occurred in corner cases like sun glare and fog, which caused a few fatal accident. Furthermore, in order to consistently maintain a robust and reliable autonomous driving system, it is essential for models not only to perform well on routine scenarios but also to adapt to newly emerging scenarios, especially those corner cases that deviate from the norm. This requires a learning mechanism that incrementally integrates new knowledge without degrading previously acquired capabilities. However, to the best of our knowledge, no existing continual learning methods have been proposed to ensure consistent and scalable corner case learning in autonomous driving. To address these limitations, we propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets, and VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases while preserving performance on previously routine scenarios, thus ensuring long-term stability and adaptability in real-world autonomous driving. We evaluate VLM-C4L on large-scale real-world autonomous driving datasets, including Waymo and the corner case dataset CODA. 
 
       
      
        Related papers
        - SEAL: Vision-Language Model-Based Safe End-to-End Cooperative Autonomous   Driving with Adaptive Long-Tail Modeling [13.81210267833274]
SEAL is a vision-based model-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios.<n> SEAL introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions; (ii) a multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability.
arXiv  Detail & Related papers  (2025-06-26T06:42:03Z) - AD-EE: Early Exiting for Fast and Reliable Vision-Language Models in   Autonomous Driving [14.250084730478797]
Real-time application of Vision-Language Models (VLMs) is hindered by high latency and computational overhead.<n>We propose AD-EE, an Early Exit framework that incorporates domain characteristics of autonomous driving.<n>We show that our method significantly reduces latency, with maximum improvements reaching up to 57.58%, and enhances object detection accuracy, with maximum gains of up to 44%.
arXiv  Detail & Related papers  (2025-06-04T08:25:40Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal   Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data.<n>We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv  Detail & Related papers  (2025-02-28T21:53:47Z) - CurricuVLM: Towards Safe Autonomous Driving via Personalized   Safety-Critical Curriculum Learning with Vision-Language Models [1.6612510324510592]
CurricuVLM is a novel framework that enables personalized curriculum learning for autonomous driving agents.<n>Our approach exploits Vision-Language Models (VLMs) to analyze agent behavior, identify performance weaknesses, and dynamically generate tailored training scenarios.<n>CurricuVLM outperforms state-of-the-art baselines across both regular and safety-critical scenarios.
arXiv  Detail & Related papers  (2025-02-21T00:42:40Z) - VLM-AD: End-to-End Autonomous Driving through Vision-Language Model   Supervision [20.43366384946928]
Vision-language models (VLMs) as teachers to enhance training.<n>VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.
arXiv  Detail & Related papers  (2024-12-19T01:53:36Z) - SafeDrive: Knowledge- and Data-Driven Risk-Sensitive Decision-Making for   Autonomous Vehicles with Large Language Models [14.790308656087316]
SafeDrive is a knowledge- and data-driven risk-sensitive decision-making framework to enhance autonomous driving safety and adaptability.<n>By integrating knowledge-driven insights with adaptive learning mechanisms, the framework ensures robust decision-making under uncertain conditions.
arXiv  Detail & Related papers  (2024-12-17T16:45:27Z) - Realistic Corner Case Generation for Autonomous Vehicles with Multimodal   Large Language Model [10.741225574706]
AutoScenario is a framework for realistic corner case generation.<n>It converts safety-critical real-world data from multiple sources into textual representations.<n>It integrates tools from the Simulation of Urban Mobility (SUMO) and CARLA simulators.
arXiv  Detail & Related papers  (2024-11-29T20:23:28Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv  Detail & Related papers  (2024-11-25T16:38:17Z) - From Imitation to Exploration: End-to-end Autonomous Driving based on   World Model [24.578178308010912]
RAMBLE is an end-to-end world model-based RL method for driving decision-making.
It can handle complex and dynamic traffic scenarios.
It achieves state-of-the-art performance in route completion rate on the CARLA Leaderboard 1.0 and completes all 38 scenarios on the CARLA Leaderboard 2.0.
arXiv  Detail & Related papers  (2024-10-03T06:45:59Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous   Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv  Detail & Related papers  (2024-03-26T04:27:56Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv  Detail & Related papers  (2023-03-27T02:55:56Z) - COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked
  Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.
Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv  Detail & Related papers  (2022-05-04T17:55:12Z) - Differentiable Control Barrier Functions for Vision-based End-to-End
  Autonomous Driving [100.57791628642624]
We introduce a safety guaranteed learning framework for vision-based end-to-end autonomous driving.
We design a learning system equipped with differentiable control barrier functions (dCBFs) that is trained end-to-end by gradient descent.
arXiv  Detail & Related papers  (2022-03-04T16:14:33Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.