Related papers: SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

URL: http://arxiv.org/abs/2403.19438v1
Date: Thu, 28 Mar 2024 14:07:13 GMT
Title: SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang,
Abstract summary: We present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We develop a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data.
Score: 59.20038082523832
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.

Related papers

DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models [22.21497010925769]
DriveGen is a novel traffic simulation framework with large models for more diverse traffic generation. DriveGen fully utilizes large models' high-level cognition and reasoning of driving behavior. Our generated scenarios and corner cases have a superior performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-03-04T06:14:21Z)
DriveGPT: Scaling Autoregressive Behavior Models for Driving [11.733428769776204]
We present DriveGPT, a scalable behavior model for autonomous driving. We learn a transformer model to predict future agent states as tokens in an autoregressive fashion. We scale up our model parameters and training data by multiple orders of magnitude, enabling us to explore the scaling properties.
arXiv Detail & Related papers (2024-12-19T00:06:09Z)
Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples. An autoencoder with self-attention layers and optimal transport refines distributional consistency. Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z)
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data [48.31817189858086]
We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. We find that DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
arXiv Detail & Related papers (2024-05-16T15:30:18Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future [130.87142103774752]
This review systematically assesses over seventy open-source autonomous driving datasets. It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets. It also delves into the scientific and technical challenges that warrant resolution.
arXiv Detail & Related papers (2023-12-06T10:46:53Z)
Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications. ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations. We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
Space-Filling Subset Selection for an Electric Battery Model [0.0]
Real driving data on the battery's behavior represent a strongly non-uniform excitation of the system. Algorithm selects those dynamic data points that fill the input space of the nonlinear model more homogeneously. It is shown, that this reduction of the training data leads to a higher model quality in comparison to a random subset and a faster training compared to modeling using all data points.
arXiv Detail & Related papers (2020-12-07T09:12:56Z)
From Data to Actions in Intelligent Transportation Systems: a Prescription of Functional Requirements for Model Actionability [10.27718355111707]
This work aims to describe how data, coming from diverse ITS sources, can be used to learn and adapt data-driven models for efficiently operating ITS assets, systems and processes. Grounded in this described data modeling pipeline for ITS, wedefine the characteristics, engineering requisites and intrinsic challenges to its three compounding stages, namely, data fusion, adaptive learning and model evaluation.
arXiv Detail & Related papers (2020-02-06T12:02:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.