From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
- URL: http://arxiv.org/abs/2212.09298v3
- Date: Sun, 28 Apr 2024 05:23:35 GMT
- Title: From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
- Authors: Zekun Qian, Ruize Han, Wei Feng, Feifan Wang, Song Wang,
- Abstract summary: We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration.
This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene.
We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts.
- Score: 20.733451121484993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene, without the BEV image and the calibration of the FPVs, while the output is a unified plane with the localization and orientation of both the subjects and cameras in a BEV. We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts: i) creating a view-transform subject detection module to transform the FPV to a virtual BEV including localization and orientation of each pedestrian, ii) deriving a geometric transformation based method to estimate camera localization and view direction, i.e., the camera registration in a unified BEV, iii) making use of spatial and appearance information to aggregate the subjects into the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for evaluation. The experimental results show the remarkable effectiveness of our proposed method.
Related papers
- DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Multi-camera Bird's Eye View Perception for Autonomous Driving [17.834495597639805]
It is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures.
The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface.
More recent approaches use deep neural networks to output directly in BEV space.
arXiv Detail & Related papers (2023-09-16T19:12:05Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - BEVControl: Accurately Controlling Street-view Elements with
Multi-perspective Consistency via BEV Sketch Layout [17.389444754562252]
We propose a two-stage generative method, dubbed BEVControl, that can generate accurate foreground and background contents.
Our experiments show that our BEVControl surpasses the state-of-the-art method, BEVGen, by a significant margin.
arXiv Detail & Related papers (2023-08-03T09:56:31Z) - Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera.
The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z) - BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary
Camera Rigs [3.5728676902207988]
We present an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs.
Specifically, our method first encodes image features from arbitrary cameras with a shared backbone.
An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation.
arXiv Detail & Related papers (2022-03-08T12:39:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.