Improved Single Camera BEV Perception Using Multi-Camera Training
- URL: http://arxiv.org/abs/2409.02676v1
- Date: Wed, 4 Sep 2024 13:06:40 GMT
- Title: Improved Single Camera BEV Perception Using Multi-Camera Training
- Authors: Daniel Busch, Ido Freeman, Richard Meyes, Tobias Meisen,
- Abstract summary: In large-scale production, cost efficiency is an optimization goal, so that using fewer cameras becomes more relevant.
This raises the problem of developing a BEV perception model that provides a sufficient performance on a low-cost sensor setup.
The objective of our approach is to reduce the aforementioned performance drop as much as possible using a modern multi-camera surround view model reduced for single-camera inference.
- Score: 4.003066044908734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bird's Eye View (BEV) map prediction is essential for downstream autonomous driving tasks like trajectory prediction. In the past, this was accomplished through the use of a sophisticated sensor configuration that captured a surround view from multiple cameras. However, in large-scale production, cost efficiency is an optimization goal, so that using fewer cameras becomes more relevant. But the consequence of fewer input images correlates with a performance drop. This raises the problem of developing a BEV perception model that provides a sufficient performance on a low-cost sensor setup. Although, primarily relevant for inference time on production cars, this cost restriction is less problematic on a test vehicle during training. Therefore, the objective of our approach is to reduce the aforementioned performance drop as much as possible using a modern multi-camera surround view model reduced for single-camera inference. The approach includes three features, a modern masking technique, a cyclic Learning Rate (LR) schedule, and a feature reconstruction loss for supervising the transition from six-camera inputs to one-camera input during training. Our method outperforms versions trained strictly with one camera or strictly with six-camera surround view for single-camera inference resulting in reduced hallucination and better quality of the BEV map.
Related papers
- UniDrive: Towards Universal Driving Perception Across Camera Configurations [38.40168936403638]
3D perception aims to infer 3D information from 2D images based on 3D-2D projection.
Generalizing across camera configurations is important for deploying autonomous driving models on different car models.
We present UniDrive, a novel framework for vision-centric autonomous driving to achieve universal perception across camera configurations.
arXiv Detail & Related papers (2024-10-17T17:59:59Z) - RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View [3.165441652093544]
This paper systematically analyzes the key challenges in multi-camera BEV perception for roadside scenarios compared to vehicle-side.
RopeBEV introduces BEV augmentation to address the training balance issues caused by diverse camera poses.
Our method ranks 1st on the real-world highway dataset RoScenes.
arXiv Detail & Related papers (2024-09-18T05:16:34Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - Multi-Camera Calibration Free BEV Representation for 3D Object Detection [8.085831393926561]
We present a completely Multi-Camera Free Transformer (CFT) for robust Bird's Eye View (BEV) representation.
CFT mines potential 3D information in BEV via our designed position-aware enhancement (PA)
CFT achieves 49.7% NDS on the nuScenes detection task leaderboard, which is the first work removing camera parameters.
arXiv Detail & Related papers (2022-10-31T12:18:08Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments.
AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z) - Balancing the Budget: Feature Selection and Tracking for Multi-Camera
Visual-Inertial Odometry [3.441021278275805]
We present a multi-camera visual-inertial odometry system based on factor graph optimization.
We focus on motion tracking in challenging environments such as in narrow corridors and dark spaces with aggressive motions and abrupt lighting changes.
arXiv Detail & Related papers (2021-09-13T13:53:09Z) - Cross-Camera Feature Prediction for Intra-Camera Supervised Person
Re-identification across Distant Scenes [70.30052164401178]
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views.
ICS-DS Re-ID uses cross-camera unpaired data with intra-camera identity labels for training.
Cross-camera feature prediction method to mine cross-camera self supervision information.
Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme.
arXiv Detail & Related papers (2021-07-29T11:27:50Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - Infrastructure-based Multi-Camera Calibration using Radial Projections [117.22654577367246]
Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually.
Infrastucture-based calibration techniques are able to estimate the extrinsics using 3D maps pre-built via SLAM or Structure-from-Motion.
We propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.
arXiv Detail & Related papers (2020-07-30T09:21:04Z) - Rethinking the Distribution Gap of Person Re-identification with
Camera-based Batch Normalization [90.9485099181197]
This paper rethinks the working mechanism of conventional ReID approaches.
We force the image data of all cameras to fall onto the same subspace, so that the distribution gap between any camera pair is largely shrunk.
Experiments on a wide range of ReID tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-01-23T17:22:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.