DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving
- URL: http://arxiv.org/abs/2511.18713v1
- Date: Mon, 24 Nov 2025 03:12:43 GMT
- Title: DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving
- Authors: Hongbin Lin, Yiming Yang, Chaoda Zheng, Yifan Zhang, Shuaicheng Niu, Zilu Guo, Yafeng Li, Gui Gui, Shuguang Cui, Zhen Li,
- Abstract summary: DriveFlow is a Rectified Flow Adaptation method for training data enhancement in autonomous driving.<n>It incorporates a high-frequency alignment loss for foreground to maintain precise 3D object geometry.<n>It also conducts dual-frequency optimization for background, balancing editing flexibility and semantic consistency.
- Score: 85.14946767994932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In autonomous driving, vision-centric 3D object detection recognizes and localizes 3D objects from RGB images. However, due to high annotation costs and diverse outdoor scenes, training data often fails to cover all possible test scenarios, known as the out-of-distribution (OOD) issue. Training-free image editing offers a promising solution for improving model robustness by training data enhancement without any modifications to pre-trained diffusion models. Nevertheless, inversion-based methods often suffer from limited effectiveness and inherent inaccuracies, while recent rectified-flow-based approaches struggle to preserve objects with accurate 3D geometry. In this paper, we propose DriveFlow, a Rectified Flow Adaptation method for training data enhancement in autonomous driving based on pre-trained Text-to-Image flow models. Based on frequency decomposition, DriveFlow introduces two strategies to adapt noise-free editing paths derived from text-conditioned velocities. 1) High-Frequency Foreground Preservation: DriveFlow incorporates a high-frequency alignment loss for foreground to maintain precise 3D object geometry. 2) Dual-Frequency Background Optimization: DriveFlow also conducts dual-frequency optimization for background, balancing editing flexibility and semantic consistency. Comprehensive experiments validate the effectiveness and efficiency of DriveFlow, demonstrating comprehensive performance improvements on all categories across OOD scenarios. Code is available at https://github.com/Hongbin98/DriveFlow.
Related papers
- SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR [17.224692757126153]
We present a deep learning architecture for sparse scene flow estimation using 2D monocular images and 3D point clouds.<n>Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together.<n>Experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets.
arXiv Detail & Related papers (2026-02-25T09:03:42Z) - Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training [27.251232052868033]
Trajectory-Guided image-to-video (I2V) generation aims to synthesize videos that adhere to user-specified motion instructions.<n>Zo3T significantly enhances 3D realism and motion accuracy in trajectory-controlled I2V generation.
arXiv Detail & Related papers (2025-09-08T14:21:45Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation [49.32104127246474]
DriveGEN is a training-free controllable Text-to-Image Diffusion Generation.<n>It consistently preserves objects with precise 3D geometry across diverse Out-of-Distribution generations.
arXiv Detail & Related papers (2025-03-14T06:35:38Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow [72.9209434105892]
We propose to enhance the text-to-3D optimization by leveraging the T2I diffusion prior in the generative sampling process with a predetermined timestep schedule.
By leveraging the proposed novel optimization algorithm, we design DreamFlow, a practical three-stage coarseto-fine text-to-3D optimization framework.
arXiv Detail & Related papers (2024-03-22T05:38:15Z) - StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception.
We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy.
Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z) - AutoFlow: Learning a Better Training Set for Optical Flow [62.40293188964933]
AutoFlow is a method to render training data for optical flow.
AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.
arXiv Detail & Related papers (2021-04-29T17:55:23Z) - PillarFlow: End-to-end Birds-eye-view Flow Estimation for Autonomous
Driving [42.8479177012748]
We propose an end-to-end deep learning framework for LIDAR-based flow estimation in bird's eye view (BeV)
Our method takes consecutive point cloud pairs as input and produces a 2-D BeV flow grid describing the dynamic state of each cell.
The experimental results show that the proposed method not only estimates 2-D BeV flow accurately but also improves tracking performance of both dynamic and static objects.
arXiv Detail & Related papers (2020-08-03T20:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.