"The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better
Instantaneous Mapping
- URL: http://arxiv.org/abs/2204.02944v1
- Date: Wed, 6 Apr 2022 17:23:13 GMT
- Title: "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better
Instantaneous Mapping
- Authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden
- Abstract summary: Estimating a semantically segmented bird's-eye-view map from a single image has become a popular technique for autonomous control and navigation.
We show an increase in localization error with distance from the camera.
We propose a graph neural network which predicts BEV objects from a monocular image by spatially reasoning about an object within the context of other objects.
- Score: 45.94778766867247
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Estimating a semantically segmented bird's-eye-view (BEV) map from a single
image has become a popular technique for autonomous control and navigation.
However, they show an increase in localization error with distance from the
camera. While such an increase in error is entirely expected - localization is
harder at distance - much of the drop in performance can be attributed to the
cues used by current texture-based models, in particular, they make heavy use
of object-ground intersections (such as shadows), which become increasingly
sparse and uncertain for distant objects. In this work, we address these
shortcomings in BEV-mapping by learning the spatial relationship between
objects in a scene. We propose a graph neural network which predicts BEV
objects from a monocular image by spatially reasoning about an object within
the context of other objects. Our approach sets a new state-of-the-art in BEV
estimation from monocular images across three large-scale datasets, including a
50% relative improvement for objects on nuScenes.
Related papers
- VirtualPainting: Addressing Sparsity with Virtual Points and
Distance-Aware Data Augmentation for 3D Object Detection [3.5259183508202976]
We present an innovative approach that involves the generation of virtual LiDAR points using camera images.
We also enhance these virtual points with semantic labels obtained from image-based segmentation networks.
Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods.
arXiv Detail & Related papers (2023-12-26T18:03:05Z) - Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement
Approach [1.3931837019950217]
We advocate for the use of Bird's Eye View perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity.
In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional methods.
arXiv Detail & Related papers (2023-12-20T15:22:34Z) - DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object
Detection and Tracking [67.34803048690428]
We propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem.
DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden.
It is flexible and practical that can be plugged into most camera-based 3D object detectors.
arXiv Detail & Related papers (2023-03-29T12:33:55Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - BEV-Locator: An End-to-end Visual Semantic Localization Network Using
Multi-View Images [13.258689143949912]
We propose an end-to-end visual semantic localization neural network using multi-view camera images.
The BEV-Locator is capable to estimate the vehicle poses under versatile scenarios.
Experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$circ$ in lateral, longitudinal translation and heading angle degree.
arXiv Detail & Related papers (2022-11-27T20:24:56Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z) - BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object
Detection for Autonomous Driving [2.9769485817170387]
CNNs can leverage the global context in the scene to project better.
We create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes.
We observe a significant improvement of 13% in mIoU using the simple baseline implementation.
arXiv Detail & Related papers (2021-07-11T01:11:58Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.