Unlock the Power of Unlabeled Data in Language Driving Model
- URL: http://arxiv.org/abs/2503.10586v2
- Date: Sat, 15 Mar 2025 06:25:33 GMT
- Title: Unlock the Power of Unlabeled Data in Language Driving Model
- Authors: Chaoqun Wang, Jie Yang, Xiaobin Hong, Ruimao Zhang,
- Abstract summary: We build a strong Language Driving Model (LDM) for driving scene question-answering, outperforming previous state-of-the-art methods.<n>Our LDM achieves 44.85% performance with limited labeled data, increasing to 54.27% when using unlabeled data, while models trained with full datasets reach 60.68% on the DriveLM benchmark.
- Score: 23.648749606793118
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Vision-based Large Language Models~(VisionLLMs) for autonomous driving have seen rapid advancements. However, such promotion is extremely dependent on large-scale high-quality annotated data, which is costly and labor-intensive. To address this issue, we propose unlocking the value of abundant yet unlabeled data to improve the language-driving model in a semi-supervised learning manner. Specifically, we first introduce a series of template-based prompts to extract scene information, generating questions that create pseudo-answers for the unlabeled data based on a model trained with limited labeled data. Next, we propose a Self-Consistency Refinement method to improve the quality of these pseudo-annotations, which are later used for further training. By utilizing a pre-trained VisionLLM (e.g., InternVL), we build a strong Language Driving Model (LDM) for driving scene question-answering, outperforming previous state-of-the-art methods. Extensive experiments on the DriveLM benchmark show that our approach performs well with just 5% labeled data, achieving competitive performance against models trained with full datasets. In particular, our LDM achieves 44.85% performance with limited labeled data, increasing to 54.27% when using unlabeled data, while models trained with full datasets reach 60.68% on the DriveLM benchmark.
Related papers
- DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training [16.441081996257576]
Large language models (LLMs) have recently achieved remarkable performance on various complex reasoning benchmarks.
We construct a large-scale, difficulty-graded reasoning dataset containing about 3.34 million unique queries of varying difficulty levels.
We significantly improve the reasoning capabilities of the base model, achieving a pass rate of 79.2% on the AIME2024 mathematical reasoning benchmark.
arXiv Detail & Related papers (2025-04-24T13:57:53Z) - Scenario Understanding of Traffic Scenes Through Large Visual Language Models [0.8437187555622164]
Large Visual Language Models (LVLMs) present a compelling solution by automating image analysis and categorization through contextual queries.<n>In this study, we evaluate the capabilities of LVLMs to understand and classify urban traffic scenes on both an in-house dataset and the BDD100K.<n>We propose a scalable captioning pipeline that integrates state-of-the-art models, enabling a flexible deployment on new datasets.
arXiv Detail & Related papers (2025-01-28T18:23:12Z) - Language Models as Continuous Self-Evolving Data Engineers [32.67875951851165]
Large Language Models (LLMs) have demonstrated remarkable capabilities on various tasks.<n>Traditional training approaches rely too much on expert-labeled data.<n>We propose a novel paradigm named LANCE that enables LLMs to train themselves by autonomously generating, cleaning, reviewing, and annotating data.
arXiv Detail & Related papers (2024-12-19T18:28:41Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.<n>LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.<n>Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.