Semantic Object-level Modeling for Robust Visual Camera Relocalization
- URL: http://arxiv.org/abs/2402.06951v1
- Date: Sat, 10 Feb 2024 13:39:44 GMT
- Title: Semantic Object-level Modeling for Robust Visual Camera Relocalization
- Authors: Yifan Zhu, Lingjuan Miao, Haitao Wu, Zhiqiang Zhou, Weiyi Chen,
Longwen Wu
- Abstract summary: We propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects.
All of these modules are entirely intergrated into visual SLAM system.
- Score: 14.998133272060695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual relocalization is crucial for autonomous visual localization and
navigation of mobile robotics. Due to the improvement of CNN-based object
detection algorithm, the robustness of visual relocalization is greatly
enhanced especially in viewpoints where classical methods fail. However,
ellipsoids (quadrics) generated by axis-aligned object detection may limit the
accuracy of the object-level representation and degenerate the performance of
visual relocalization system. In this paper, we propose a novel method of
automatic object-level voxel modeling for accurate ellipsoidal representations
of objects. As for visual relocalization, we design a better pose optimization
strategy for camera pose recovery, to fully utilize the projection
characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of
these modules are entirely intergrated into visual SLAM system. Experimental
results show that our semantic object-level mapping and object-based visual
relocalization methods significantly enhance the performance of visual
relocalization in terms of robustness to new viewpoints.
Related papers
- Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - VOOM: Robust Visual Object Odometry and Mapping using Hierarchical
Landmarks [19.789761641342043]
We propose a Visual Object Odometry and Mapping framework VOOM.
We use high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner.
VOOM outperforms both object-oriented SLAM and feature points SLAM systems in terms of localization.
arXiv Detail & Related papers (2024-02-21T08:22:46Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric
Voxelization [67.85434518679382]
We present DynaVol, a 3D scene generative model that unifies geometric structures and object-centric learning.
The key idea is to perform object-centric voxelization to capture the 3D nature of the scene.
voxel features evolve over time through a canonical-space deformation function, forming the basis for global representation learning.
arXiv Detail & Related papers (2023-04-30T05:29:28Z) - OA-SLAM: Leveraging Objects for Camera Relocalization in Visual SLAM [2.016317500787292]
We show that the major benefit of objects lies in their higher-level semantic and discriminating power.
Our experiments show that the camera can be relocalized from viewpoints where classical methods fail.
Our code and test data are released at gitlab.inria.fr/tangram/oa-slam.
arXiv Detail & Related papers (2022-09-17T14:20:08Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z) - OrcVIO: Object residual constrained Visual-Inertial Odometry [18.3130718336919]
This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models.
The ability of OrcVIO for accurate trajectory estimation and large-scale object-level mapping is evaluated using real data.
arXiv Detail & Related papers (2020-07-29T21:01:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.