Calibration-free BEV Representation for Infrastructure Perception
- URL: http://arxiv.org/abs/2303.03583v2
- Date: Fri, 14 Apr 2023 02:45:05 GMT
- Title: Calibration-free BEV Representation for Infrastructure Perception
- Authors: Siqi Fan, Zhe Wang, Xiaoliang Huo, Yan Wang, Jingjing Liu
- Abstract summary: We propose a BEV Representation-free BEV network, which achieves 3D detection based on BEV representation without calibration parameters and additional depth supervision.
Experimental results on DAIR-V2X demonstrate that CBR achieves acceptable performance without any camera parameters and is naturally not affected by calibration noises.
- Score: 13.932616053644038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective BEV object detection on infrastructure can greatly improve traffic
scenes understanding and vehicle-toinfrastructure (V2I) cooperative perception.
However, cameras installed on infrastructure have various postures, and
previous BEV detection methods rely on accurate calibration, which is difficult
for practical applications due to inevitable natural factors (e.g., wind and
snow). In this paper, we propose a Calibration-free BEV Representation (CBR)
network, which achieves 3D detection based on BEV representation without
calibration parameters and additional depth supervision. Specifically, we
utilize two multi-layer perceptrons for decoupling the features from
perspective view to front view and birdeye view under boxes-induced foreground
supervision. Then, a cross-view feature fusion module matches features from
orthogonal views according to similarity and conducts BEV feature enhancement
with front view features. Experimental results on DAIR-V2X demonstrate that CBR
achieves acceptable performance without any camera parameters and is naturally
not affected by calibration noises. We hope CBR can serve as a baseline for
future research addressing practical challenges of infrastructure perception.
Related papers
- Improving Bird's Eye View Semantic Segmentation by Task Decomposition [42.57351039508863]
We decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment.
Our approach simplifies the complexity of combining perception and generation into distinct steps, equipping the model to handle intricate and challenging scenes effectively.
arXiv Detail & Related papers (2024-04-02T13:19:45Z) - BEV$^2$PR: BEV-Enhanced Visual Place Recognition with Structural Cues [44.96177875644304]
We propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single camera.
The BEV$2$PR framework generates a composite descriptor with both visual cues and spatial awareness based on a single camera.
arXiv Detail & Related papers (2024-03-11T10:46:43Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [111.13119809216313]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - An Efficient Transformer for Simultaneous Learning of BEV and Lane
Representations in 3D Lane Detection [55.281369497158515]
We propose an efficient transformer for 3D lane detection.
Different from the vanilla transformer, our model contains a cross-attention mechanism to simultaneously learn lane and BEV representations.
Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively.
arXiv Detail & Related papers (2023-06-08T04:18:31Z) - Leveraging BEV Representation for 360-degree Visual Place Recognition [14.497501941931759]
This paper investigates the advantages of using Bird's Eye View representation in 360-degree visual place recognition (VPR)
We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion.
The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets.
arXiv Detail & Related papers (2023-05-23T08:29:42Z) - BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View
Recognition via Perspective Supervision [101.36648828734646]
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones.
The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
arXiv Detail & Related papers (2022-11-18T18:59:48Z) - PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics.
transforming image features into BEV necessitates special operators to conduct feature sampling.
We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.