Related papers: RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation

RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation

URL: http://arxiv.org/abs/2503.10410v1
Date: Thu, 13 Mar 2025 14:33:42 GMT
Title: RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
Authors: Yuwen Du, Anning Hu, Zichen Chao, Yifan Lu, Junhao Ge, Genjia Liu, Weitao Wu, Lanjun Wang, Siheng Chen,
Abstract summary: We present the first simulation framework RoCo-Sim for road-side collaborative perception.<n>RoCo-Sim is capable of generating diverse, multi-view consistent simulated roadside data.<n>Code and pre-trained models will be released soon.
Score: 30.744548212616007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Roadside Collaborative Perception refers to a system where multiple roadside units collaborate to pool their perceptual data, assisting vehicles in enhancing their environmental awareness. Existing roadside perception methods concentrate on model design but overlook data issues like calibration errors, sparse information, and multi-view consistency, leading to poor performance on recent published datasets. To significantly enhance roadside collaborative perception and address critical data issues, we present the first simulation framework RoCo-Sim for road-side collaborative perception. RoCo-Sim is capable of generating diverse, multi-view consistent simulated roadside data through dynamic foreground editing and full-scene style transfer of a single image. RoCo-Sim consists of four components: (1) Camera Extrinsic Optimization ensures accurate 3D to 2D projection for roadside cameras; (2) A novel Multi-View Occlusion-Aware Sampler (MOAS) determines the placement of diverse digital assets within 3D space; (3) DepthSAM innovatively models foreground-background relationships from single-frame fixed-view images, ensuring multi-view consistency of foreground; and (4) Scalable Post-Processing Toolkit generates more realistic and enriched scenes through style transfer and other enhancements. RoCo-Sim significantly improves roadside 3D object detection, outperforming SOTA methods by 83.74 on Rcooper-Intersection and 83.12 on TUMTraf-V2X for AP70. RoCo-Sim fills a critical gap in roadside perception simulation. Code and pre-trained models will be released soon: https://github.com/duyuwen-duen/RoCo-Sim

Related papers

RealEngine: Simulating Autonomous Driving in Realistic Context [45.42418090733243]
RealEngine is a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques.<n>By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios.<n>RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction.
arXiv Detail & Related papers (2025-05-22T17:01:00Z)
MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring [23.396192711865147]
Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view.<n>This paper introduces a novel Bird's-Eye-View road occupancy detection framework that leverages multiple roadside cameras.
arXiv Detail & Related papers (2025-02-16T22:03:03Z)
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation [62.5805866419814]
Vid2Sim is a novel framework that bridges the sim2real gap through a scalable and cost-efficient real2sim pipeline for neural 3D scene reconstruction and simulation.<n>Experiments demonstrate that Vid2Sim significantly improves the performance of urban navigation in the digital twins and real world by 31.2% and 68.3% in success rate.
arXiv Detail & Related papers (2025-01-12T03:01:15Z)
Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.<n>Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.<n>Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z)
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation [54.02069690134526]
We propose DrivingSphere, a realistic and closed-loop simulation framework. Its core idea is to build 4D world representation and generate real-life and controllable driving scenarios. By providing a dynamic and realistic simulation environment, DrivingSphere enables comprehensive testing and validation of autonomous driving algorithms.
arXiv Detail & Related papers (2024-11-18T03:00:33Z)
Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection [9.708971995966476]
This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D. We fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions.
arXiv Detail & Related papers (2024-08-28T08:44:58Z)
Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks [47.07188762367792]
We present ARSim, a framework designed to enhance real multi-view image data with 3D synthetic objects of interest. We construct a simplified virtual scene using real data and strategically place 3D synthetic assets within it. The resulting augmented multi-view consistent dataset is used to train a multi-camera perception network for autonomous vehicles.
arXiv Detail & Related papers (2024-03-22T17:49:11Z)
CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation [44.83732884335725]
Sensor simulation involves modeling traffic participants, such as vehicles, with high quality appearance and articulated geometry. Current reconstruction approaches struggle on in-the-wild sensor data, due to its sparsity and noise. We present CADSim, which combines part-aware object-class priors via a small set of CAD models with differentiable rendering to automatically reconstruct vehicle geometry.
arXiv Detail & Related papers (2023-11-02T17:56:59Z)
Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years. Data-driven simulation for autonomous driving has been a focal point of recent research. We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z)
GeoSim: Photorealistic Image Simulation with Geometry-Aware Composition [81.24107630746508]
We present GeoSim, a geometry-aware image composition process that synthesizes novel urban driving scenes. We first build a diverse bank of 3D objects with both realistic geometry and appearance from sensor data. The resulting synthetic images are photorealistic, traffic-aware, and geometrically consistent, allowing image simulation to scale to complex use cases.
arXiv Detail & Related papers (2021-01-16T23:00:33Z)
Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras. A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes. In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.