Superquadric Object Representation for Optimization-based Semantic SLAM
- URL: http://arxiv.org/abs/2109.09627v1
- Date: Mon, 20 Sep 2021 15:27:56 GMT
- Title: Superquadric Object Representation for Optimization-based Semantic SLAM
- Authors: Florian Tschopp, Juan Nieto, Roland Siegwart, Cesar Cadena
- Abstract summary: We propose a pipeline to leverage semantic mask measurements to fit SQ parameters to multi-view camera observations.
We demonstrate the system's ability to retrieve randomly generated SQ parameters from multi-view mask observations.
- Score: 31.13636619458275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Introducing semantically meaningful objects to visual Simultaneous
Localization And Mapping (SLAM) has the potential to improve both the accuracy
and reliability of pose estimates, especially in challenging scenarios with
significant view-point and appearance changes. However, how semantic objects
should be represented for an efficient inclusion in optimization-based SLAM
frameworks is still an open question. Superquadrics(SQs) are an efficient and
compact object representation, able to represent most common object types to a
high degree, and typically retrieved from 3D point-cloud data. However,
accurate 3D point-cloud data might not be available in all applications. Recent
advancements in machine learning enabled robust object recognition and semantic
mask measurements from camera images under many different appearance
conditions. We propose a pipeline to leverage such semantic mask measurements
to fit SQ parameters to multi-view camera observations using a multi-stage
initialization and optimization procedure. We demonstrate the system's ability
to retrieve randomly generated SQ parameters from multi-view mask observations
in preliminary simulation experiments and evaluate different initialization
stages and cost functions.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation [3.5379836919221566]
Estimating rigid objects' poses is one of the fundamental problems in computer vision.
This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation.
arXiv Detail & Related papers (2024-10-11T17:26:27Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries [30.17281824826716]
Existing techniques often neglect the synergistic effects of semantic and depth cues, leading to classification and position estimation errors.
We propose an input-aware Transformer framework that leverages Semantics and Depth as priors.
Our approach involves the use of an S-D that explicitly models semantic and depth priors, thereby disentangling the learning process of object categorization and position estimation.
arXiv Detail & Related papers (2024-08-13T13:51:34Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
Novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision.
It relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM)
In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Object-based SLAM utilizing unambiguous pose parameters considering
general symmetry types [20.579218922577244]
symmetric objects, whose observation at different viewpoints can be identical, can deteriorate the performance of simultaneous localization and mapping.
This work proposes a system for robustly optimizing the pose of cameras and objects even in the presence of symmetric objects.
arXiv Detail & Related papers (2023-03-13T03:07:59Z) - Ambiguity-Aware Multi-Object Pose Optimization for Visually-Assisted
Robot Manipulation [17.440729138126162]
We present an ambiguity-aware 6D object pose estimation network, PrimA6D++, as a generic uncertainty prediction method.
The proposed method shows a significant performance improvement in T-LESS and YCB-Video datasets.
We further demonstrate real-time scene recognition capability for visually-assisted robot manipulation.
arXiv Detail & Related papers (2022-11-02T08:57:20Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.