Towards Viewpoint Robustness in Bird's Eye View Segmentation
- URL: http://arxiv.org/abs/2309.05192v1
- Date: Mon, 11 Sep 2023 02:10:07 GMT
- Title: Towards Viewpoint Robustness in Bird's Eye View Segmentation
- Authors: Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan
Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez
- Abstract summary: We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
- Score: 85.99907496019972
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Autonomous vehicles (AV) require that neural networks used for perception be
robust to different viewpoints if they are to be deployed across many types of
vehicles without the repeated cost of data collection and labeling for each. AV
companies typically focus on collecting data from diverse scenarios and
locations, but not camera rig configurations, due to cost. As a result, only a
small number of rig variations exist across most fleets. In this paper, we
study how AV perception models are affected by changes in camera viewpoint and
propose a way to scale them across vehicle types without repeated data
collection and labeling. Using bird's eye view (BEV) segmentation as a
motivating task, we find through extensive experiments that existing perception
models are surprisingly sensitive to changes in camera viewpoint. When trained
with data from one camera rig, small changes to pitch, yaw, depth, or height of
the camera at inference time lead to large drops in performance. We introduce a
technique for novel view synthesis and use it to transform collected data to
the viewpoint of target rigs, allowing us to train BEV segmentation models for
diverse target rigs without any additional data collection or labeling cost. To
analyze the impact of viewpoint changes, we leverage synthetic data to mitigate
other gaps (content, ISP, etc). Our approach is then trained on real data and
evaluated on synthetic data, enabling evaluation on diverse target rigs. We
release all data for use in future work. Our method is able to recover an
average of 14.7% of the IoU that is otherwise lost when deploying to new rigs.
Related papers
- DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning [7.012508171229966]
There is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles.
We create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions.
We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras.
arXiv Detail & Related papers (2024-04-09T14:43:19Z) - Estimation of Appearance and Occupancy Information in Birds Eye View
from Surround Monocular Images [2.69840007334476]
Birds-eye View (BEV) expresses the location of different traffic participants in the ego vehicle frame from a top-down view.
We propose a novel representation that captures various traffic participants appearance and occupancy information from an array of monocular cameras covering 360 deg field of view (FOV)
We use a learned image embedding of all camera images to generate a BEV of the scene at any instant that captures both appearance and occupancy of the scene.
arXiv Detail & Related papers (2022-11-08T20:57:56Z) - Delving into the Devils of Bird's-eye-view Perception: A Review,
Evaluation and Recipe [115.31507979199564]
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia.
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios.
arXiv Detail & Related papers (2022-09-12T15:29:13Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving [10.3540046389057]
This work presents a multi-task visual perception network on unrectified fisheye images.
It consists of six primary tasks necessary for an autonomous driving system.
We demonstrate that the jointly trained model performs better than the respective single task versions.
arXiv Detail & Related papers (2021-02-15T10:46:24Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - A Sim2Real Deep Learning Approach for the Transformation of Images from
Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's
Eye View [0.0]
Distances can be more easily estimated when the camera perspective is transformed to a bird's eye view (BEV)
This paper describes a methodology to obtain a corrected 360deg BEV image given images from multiple vehicle-mounted cameras.
The neural network approach does not rely on manually labeled data, but is trained on a synthetic dataset in such a way that it generalizes well to real-world data.
arXiv Detail & Related papers (2020-05-08T14:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.