BirdSLAM: Monocular Multibody SLAM in Bird's-Eye View
- URL: http://arxiv.org/abs/2011.07613v1
- Date: Sun, 15 Nov 2020 19:37:24 GMT
- Title: BirdSLAM: Monocular Multibody SLAM in Bird's-Eye View
- Authors: Swapnil Daga, Gokul B. Nair, Anirudha Ramesh, Rahul Sajnani, Junaid
Ahmed Ansari and K. Madhava Krishna
- Abstract summary: We present BirdSLAM, a novel simultaneous localization and mapping (SLAM) system for autonomous driving platforms equipped with only a monocular camera.
BirdSLAM tackles challenges faced by other monocular SLAM systems by using an orthographic (bird's-eye) view as the configuration space in which localization and mapping are performed.
- Score: 10.250859125675259
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present BirdSLAM, a novel simultaneous localization and
mapping (SLAM) system for the challenging scenario of autonomous driving
platforms equipped with only a monocular camera. BirdSLAM tackles challenges
faced by other monocular SLAM systems (such as scale ambiguity in monocular
reconstruction, dynamic object localization, and uncertainty in feature
representation) by using an orthographic (bird's-eye) view as the configuration
space in which localization and mapping are performed. By assuming only the
height of the ego-camera above the ground, BirdSLAM leverages single-view
metrology cues to accurately localize the ego-vehicle and all other traffic
participants in bird's-eye view. We demonstrate that our system outperforms
prior work that uses strictly greater information, and highlight the relevance
of each design decision via an ablation analysis.
Related papers
- MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations [6.688344169640982]
We propose a Mamba-based framework called MamBEV, which learns unified Bird's Eye View representations.
MamBEV supports multiple 3D perception tasks with significantly improved computational and memory efficiency.
experiments demonstrate MamBEV's promising performance across diverse visual perception metrics.
arXiv Detail & Related papers (2025-03-18T03:18:45Z) - DVM-SLAM: Decentralized Visual Monocular Simultaneous Localization and Mapping for Multi-Agent Systems [7.907742876205873]
We present Decentralized Visual Monocular SLAM (DVM-SLAM), the first open-source decentralized monocular C-SLAM system.
DVM-SLAM's real-world applicability is validated on physical robots with a custom collision avoidance framework.
We also demonstrate comparable accuracy to state-of-the-art centralized monocular C-SLAM systems.
arXiv Detail & Related papers (2025-03-06T06:10:21Z) - TopView: Vectorising road users in a bird's eye view from uncalibrated street-level imagery with deep learning [2.7195102129095003]
We introduce a simple approach for estimating a bird's eye view from images without prior knowledge of a given camera's intrinsic and extrinsic parameters.
The framework has been applied to several applications to generate a live Map from camera feeds and to analyse social distancing violations at the city scale.
arXiv Detail & Related papers (2024-12-18T21:55:58Z) - Lightweight Vision Transformer with Bidirectional Interaction [63.65115590184169]
We propose a Fully Adaptive Self-Attention (FASA) mechanism for vision transformer to model the local and global information.
Based on FASA, we develop a family of lightweight vision backbones, Fully Adaptive Transformer (FAT) family.
arXiv Detail & Related papers (2023-06-01T06:56:41Z) - NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM [51.21564182169607]
Newton is a view-centric mapping method that dynamically constructs neural fields based on run-time observation.
Our method enables camera pose updates using loop closures and scene boundary updates by representing the scene with multiple neural fields.
The experimental results demonstrate the superior performance of our method over existing world-centric neural field-based SLAM systems.
arXiv Detail & Related papers (2023-03-23T20:22:01Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth
estimation [5.33931801679129]
MOTSLAM is a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects.
Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.
arXiv Detail & Related papers (2022-10-05T06:07:10Z) - Improved Real-Time Monocular SLAM Using Semantic Segmentation on
Selective Frames [15.455647477995312]
monocular simultaneous localization and mapping (SLAM) is emerging in advanced driver assistance systems and autonomous driving.
This paper proposes an improved real-time monocular SLAM using deep learning-based semantic segmentation.
Experiments with six video sequences demonstrate that the proposed monocular SLAM system achieves significantly more accurate trajectory tracking accuracy.
arXiv Detail & Related papers (2021-04-30T22:34:45Z) - Understanding Bird's-Eye View Semantic HD-Maps Using an Onboard
Monocular Camera [110.83289076967895]
We study scene understanding in the form of online estimation of semantic bird's-eye-view HD-maps using the video input from a single onboard camera.
In our experiments, we demonstrate that the considered aspects are complementary to each other for HD-map understanding.
arXiv Detail & Related papers (2020-12-05T14:39:14Z) - SLAM in the Field: An Evaluation of Monocular Mapping and Localization
on Challenging Dynamic Agricultural Environment [12.666030953871186]
This paper demonstrates a system capable of combining a sparse, indirect, monocular visual SLAM, with both offline and real-time Multi-View Stereo (MVS) reconstruction algorithms.
The use of a monocular SLAM makes our system much easier to integrate with an existing device, as we do not rely on a LiDAR.
arXiv Detail & Related papers (2020-11-02T16:53:35Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.