RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View
- URL: http://arxiv.org/abs/2409.11706v1
- Date: Wed, 18 Sep 2024 05:16:34 GMT
- Title: RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View
- Authors: Jinrang Jia, Guangqi Yi, Yifeng Shi,
- Abstract summary: This paper systematically analyzes the key challenges in multi-camera BEV perception for roadside scenarios compared to vehicle-side.
RopeBEV introduces BEV augmentation to address the training balance issues caused by diverse camera poses.
Our method ranks 1st on the real-world highway dataset RoScenes.
- Score: 3.165441652093544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-camera perception methods in Bird's-Eye-View (BEV) have gained wide application in autonomous driving. However, due to the differences between roadside and vehicle-side scenarios, there currently lacks a multi-camera BEV solution in roadside. This paper systematically analyzes the key challenges in multi-camera BEV perception for roadside scenarios compared to vehicle-side. These challenges include the diversity in camera poses, the uncertainty in Camera numbers, the sparsity in perception regions, and the ambiguity in orientation angles. In response, we introduce RopeBEV, the first dense multi-camera BEV approach. RopeBEV introduces BEV augmentation to address the training balance issues caused by diverse camera poses. By incorporating CamMask and ROIMask (Region of Interest Mask), it supports variable camera numbers and sparse perception, respectively. Finally, camera rotation embedding is utilized to resolve orientation ambiguity. Our method ranks 1st on the real-world highway dataset RoScenes and demonstrates its practical value on a private urban dataset that covers more than 50 intersections and 600 cameras.
Related papers
- Improved Single Camera BEV Perception Using Multi-Camera Training [4.003066044908734]
In large-scale production, cost efficiency is an optimization goal, so that using fewer cameras becomes more relevant.
This raises the problem of developing a BEV perception model that provides a sufficient performance on a low-cost sensor setup.
The objective of our approach is to reduce the aforementioned performance drop as much as possible using a modern multi-camera surround view model reduced for single-camera inference.
arXiv Detail & Related papers (2024-09-04T13:06:40Z) - RoadBEV: Road Surface Reconstruction in Bird's Eye View [55.0558717607946]
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance.
Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction.
This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo.
arXiv Detail & Related papers (2024-04-09T20:24:29Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z) - From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration [20.733451121484993]
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration.
This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene.
We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts.
arXiv Detail & Related papers (2022-12-19T08:31:08Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and
Monocular 3D Object Detection Task [48.555440807415664]
We present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.
The dataset consists of 50k images and over 1.5M 3D objects in various scenes.
We propose to leverage the geometry constraint to solve the inherent ambiguities caused by various sensors, viewpoints.
arXiv Detail & Related papers (2022-03-25T12:13:23Z) - Structured Bird's-Eye-View Traffic Scene Understanding from Onboard
Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image.
We show that the method can be extended to detect dynamic objects on the BEV plane.
We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.