Related papers: Semantic Object-level Modeling for Robust Visual Camera Relocalization

Semantic Object-level Modeling for Robust Visual Camera Relocalization

URL: http://arxiv.org/abs/2402.06951v1
Date: Sat, 10 Feb 2024 13:39:44 GMT
Title: Semantic Object-level Modeling for Robust Visual Camera Relocalization
Authors: Yifan Zhu, Lingjuan Miao, Haitao Wu, Zhiqiang Zhou, Weiyi Chen, Longwen Wu
Abstract summary: We propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. All of these modules are entirely intergrated into visual SLAM system.
Score: 14.998133272060695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual relocalization is crucial for autonomous visual localization and navigation of mobile robotics. Due to the improvement of CNN-based object detection algorithm, the robustness of visual relocalization is greatly enhanced especially in viewpoints where classical methods fail. However, ellipsoids (quadrics) generated by axis-aligned object detection may limit the accuracy of the object-level representation and degenerate the performance of visual relocalization system. In this paper, we propose a novel method of automatic object-level voxel modeling for accurate ellipsoidal representations of objects. As for visual relocalization, we design a better pose optimization strategy for camera pose recovery, to fully utilize the projection characteristics of 2D fitted ellipses and the 3D accurate ellipsoids. All of these modules are entirely intergrated into visual SLAM system. Experimental results show that our semantic object-level mapping and object-based visual relocalization methods significantly enhance the performance of visual relocalization in terms of robustness to new viewpoints.

Related papers

HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision. We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects. Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image. By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z)
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations [55.74675012171316]
RELOCATE is a training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training, RELOCATE leverages a region-based representation derived from pretrained vision models.
arXiv Detail & Related papers (2024-12-02T18:59:53Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations. We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action. Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z)
VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks [19.789761641342043]
We propose a Visual Object Odometry and Mapping framework VOOM. We use high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner. VOOM outperforms both object-oriented SLAM and feature points SLAM systems in terms of localization.
arXiv Detail & Related papers (2024-02-21T08:22:46Z)
LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping [9.690844449175948]
We focus on object pose estimation. Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects. We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization [67.85434518679382]
We present DynaVol, a 3D scene generative model that unifies geometric structures and object-centric learning. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene. voxel features evolve over time through a canonical-space deformation function, forming the basis for global representation learning.
arXiv Detail & Related papers (2023-04-30T05:29:28Z)
OA-SLAM: Leveraging Objects for Camera Relocalization in Visual SLAM [2.016317500787292]
We show that the major benefit of objects lies in their higher-level semantic and discriminating power. Our experiments show that the camera can be relocalized from viewpoints where classical methods fail. Our code and test data are released at gitlab.inria.fr/tangram/oa-slam.
arXiv Detail & Related papers (2022-09-17T14:20:08Z)
Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances. We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module. The image synthesis network is designed to efficiently span the pose configuration space. We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
OrcVIO: Object residual constrained Visual-Inertial Odometry [18.3130718336919]
This work presents OrcVIO, for visual-inertial odometry tightly coupled with tracking and optimization over structured object models. The ability of OrcVIO for accurate trajectory estimation and large-scale object-level mapping is evaluated using real data.
arXiv Detail & Related papers (2020-07-29T21:01:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.