Related papers: DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

URL: http://arxiv.org/abs/2407.15661v1
Date: Mon, 22 Jul 2024 14:18:52 GMT
Title: DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
Authors: Jiahang Tu, Wei Ji, Hanbin Zhao, Chao Zhang, Roger Zimmermann, Hui Qian,
Abstract summary: In autonomous driving, datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects. We propose DriveDiTFit, a novel method for efficiently generating autonomous Driving data by Fine-tuning pre-trained Diffusion Transformers (DiTs) Specifically, DriveDiTFit utilizes a gap-driven modulation technique to carefully select and efficiently fine-tune a few parameters in DiTs according to the discrepancy between the pre-trained source data and the target driving data.
Score: 27.92501884414881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In autonomous driving, deep models have shown remarkable performance across various visual perception tasks with the demand of high-quality and huge-diversity training datasets. Such datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects. However, manually collecting these data presents huge challenges and expensive cost. With the rapid development of large generative models, we propose DriveDiTFit, a novel method for efficiently generating autonomous Driving data by Fine-tuning pre-trained Diffusion Transformers (DiTs). Specifically, DriveDiTFit utilizes a gap-driven modulation technique to carefully select and efficiently fine-tune a few parameters in DiTs according to the discrepancy between the pre-trained source data and the target driving data. Additionally, DriveDiTFit develops an effective weather and lighting condition embedding module to ensure diversity in the generated data, which is initialized by a nearest-semantic-similarity initialization approach. Through progressive tuning scheme to refined the process of detail generation in early diffusion process and enlarging the weights corresponding to small objects in training loss, DriveDiTFit ensures high-quality generation of small moving objects in the generated data. Extensive experiments conducted on driving datasets confirm that our method could efficiently produce diverse real driving data. The source codes will be available at https://github.com/TtuHamg/DriveDiTFit.

Related papers

DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models [22.21497010925769]
DriveGen is a novel traffic simulation framework with large models for more diverse traffic generation. DriveGen fully utilizes large models' high-level cognition and reasoning of driving behavior. Our generated scenarios and corner cases have a superior performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-03-04T06:14:21Z)
GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model [6.144680854063938]
GenDDS is a novel approach for generating driving scenarios for autonomous driving systems. We employ the KITTI dataset, which includes real-world driving videos, to train the model. We demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios.
arXiv Detail & Related papers (2024-08-28T15:37:44Z)
PLT-D3: A High-fidelity Dynamic Driving Simulation Dataset for Stereo Depth and Scene Flow [0.0]
This paper introduces Dynamic-weather Driving dataset; a high-fidelity stereo depth and scene flow ground truth data generated using Engine 5. In particular, this dataset includes synchronized high-resolution stereo image sequences that replicate a wide array of dynamic weather scenarios. Benchmarks have been established for several critical autonomous driving tasks using Unreal-D3 to measure and enhance the performance of state-of-the-art models.
arXiv Detail & Related papers (2024-06-11T19:21:46Z)
SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving [0.0]
We present a novel synthetically generated multi-modal dataset, SCaRL, to enable the training and validation of autonomous driving solutions. SCaRL is a large dataset based on the CARLA Simulator, which provides data for diverse, dynamic scenarios and traffic conditions.
arXiv Detail & Related papers (2024-05-27T10:31:26Z)
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control [59.20038082523832]
We present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We develop a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data.
arXiv Detail & Related papers (2024-03-28T14:07:13Z)
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control [68.14798033899955]
Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content. However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? We investigate this question in the context of autonomous driving, and answer it with a resounding "yes"
arXiv Detail & Related papers (2023-12-05T18:34:12Z)
Pre-training on Synthetic Driving Data for Trajectory Prediction [61.520225216107306]
We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. We adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. We conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies.
arXiv Detail & Related papers (2023-09-18T19:49:22Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
A Hybrid Rule-Based and Data-Driven Approach to Driver Modeling through Particle Filtering [6.9485501711137525]
We propose a methodology that combines rule-based modeling with data-driven learning. Our results show that driver models based on our hybrid rule-based and data-driven approach can accurately capture real-world driving behavior.
arXiv Detail & Related papers (2021-08-29T11:07:14Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
Deep traffic light detection by overlaying synthetic context on arbitrary natural images [49.592798832978296]
We propose a method to generate artificial traffic-related training data for deep traffic light detectors. This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds. It also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state.
arXiv Detail & Related papers (2020-11-07T19:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.