Related papers: OA-SLAM: Leveraging Objects for Camera Relocalization in Visual SLAM

OA-SLAM: Leveraging Objects for Camera Relocalization in Visual SLAM

URL: http://arxiv.org/abs/2209.08338v1
Date: Sat, 17 Sep 2022 14:20:08 GMT
Title: OA-SLAM: Leveraging Objects for Camera Relocalization in Visual SLAM
Authors: Matthieu Zins, Gilles Simon, Marie-Odile Berger
Abstract summary: We show that the major benefit of objects lies in their higher-level semantic and discriminating power. Our experiments show that the camera can be relocalized from viewpoints where classical methods fail. Our code and test data are released at gitlab.inria.fr/tangram/oa-slam.
Score: 2.016317500787292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we explore the use of objects in Simultaneous Localization and Mapping in unseen worlds and propose an object-aided system (OA-SLAM). More precisely, we show that, compared to low-level points, the major benefit of objects lies in their higher-level semantic and discriminating power. Points, on the contrary, have a better spatial localization accuracy than the generic coarse models used to represent objects (cuboid or ellipsoid). We show that combining points and objects is of great interest to address the problem of camera pose recovery. Our main contributions are: (1) we improve the relocalization ability of a SLAM system using high-level object landmarks; (2) we build an automatic system, capable of identifying, tracking and reconstructing objects with 3D ellipsoids; (3) we show that object-based localization can be used to reinitialize or resume camera tracking. Our fully automatic system allows on-the-fly object mapping and enhanced pose tracking recovery, which we think, can significantly benefit to the AR community. Our experiments show that the camera can be relocalized from viewpoints where classical methods fail. We demonstrate that this localization allows a SLAM system to continue working despite a tracking loss, which can happen frequently with an uninitiated user. Our code and test data are released at gitlab.inria.fr/tangram/oa-slam.

Related papers

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting [54.92763171355442]
ObjectGS is an object-aware framework that unifies 3D scene reconstruction with semantic understanding.<n>We show through experiments that ObjectGS outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks.
arXiv Detail & Related papers (2025-07-21T10:06:23Z)
Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping [8.332670136772558]
We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping.<n>By replacing the standard 2D keypoint-based matcher of structure-from-motion with an object-centric matcher based on image-derived 3D boxes, we estimate metric camera poses, object tracks, and finally produce a global, semantic 3D object map.
arXiv Detail & Related papers (2025-05-29T17:59:45Z)
Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space. We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z)
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks [19.789761641342043]
We propose a Visual Object Odometry and Mapping framework VOOM. We use high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner. VOOM outperforms both object-oriented SLAM and feature points SLAM systems in terms of localization.
arXiv Detail & Related papers (2024-02-21T08:22:46Z)
Semantic Object-level Modeling for Robust Visual Camera Relocalization [14.998133272060695]
We propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. All of these modules are entirely intergrated into visual SLAM system.
arXiv Detail & Related papers (2024-02-10T13:39:44Z)
Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database. Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z)
DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking [67.34803048690428]
We propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem. DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden. It is flexible and practical that can be plugged into most camera-based 3D object detectors.
arXiv Detail & Related papers (2023-03-29T12:33:55Z)
LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation [69.70498875887611]
LocPoseNet is able to robustly learn location prior for unseen objects. Our method outperforms existing works by a large margin on LINEMOD and GenMOP.
arXiv Detail & Related papers (2022-11-29T15:21:34Z)
TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM [0.0]
TwistSLAM++ is a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information. We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.
arXiv Detail & Related papers (2022-09-16T12:28:21Z)
Visual-Inertial Multi-Instance Dynamic SLAM with Object-level Relocalisation [14.302118093865849]
We present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system. It can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment.
arXiv Detail & Related papers (2022-08-08T17:13:24Z)
Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges. We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
DSP-SLAM: Object Oriented SLAM with Deep Shape Priors [16.867669408751507]
We propose an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects. DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system. Our evaluation shows improvements in object pose and shape reconstruction with respect to recent deep prior-based reconstruction methods.
arXiv Detail & Related papers (2021-08-21T10:00:12Z)
Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community. In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs. We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.