STAIR: Semantic-Targeted Active Implicit Reconstruction
- URL: http://arxiv.org/abs/2403.11233v1
- Date: Sun, 17 Mar 2024 14:42:05 GMT
- Title: STAIR: Semantic-Targeted Active Implicit Reconstruction
- Authors: Liren Jin, Haofei Kuang, Yue Pan, Cyrill Stachniss, Marija Popović,
- Abstract summary: Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is relevant for a robot to perform downstream tasks.
We propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input.
- Score: 23.884933841874908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input. The key components of our framework are a semantic implicit neural representation and a compatible planning utility function based on semantic rendering and uncertainty estimation, enabling adaptive view planning to target objects of interest. Our planning approach achieves better reconstruction performance in terms of mesh and novel view rendering quality compared to implicit reconstruction baselines that do not consider semantics for view planning. Our framework further outperforms a state-of-the-art semantic-targeted active reconstruction pipeline based on explicit maps, justifying our choice of utilising implicit neural representations to tackle semantic-targeted active reconstruction problems.
Related papers
- From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models [7.949705607963995]
vision language models (VLMs) have shown considerable advances in robotics applications.
We take advantage of VLMs zero-shot capabilities and translate this ability to 6D object pose estimation.
We propose a novel framework for promptable zero-shot 6D object pose estimation using language embeddings.
arXiv Detail & Related papers (2024-09-09T08:15:39Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimization [1.2268315442962412]
Actively planning sensor views during object reconstruction is crucial for autonomous mobile robots.
We propose a seamless integration of the emerging implicit representation with the active reconstruction task.
Our approach effectively improves reconstruction accuracy and efficiency of view planning in active reconstruction tasks.
arXiv Detail & Related papers (2023-03-29T14:42:30Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - SORNet: Spatial Object-Centric Representations for Sequential
Manipulation [39.88239245446054]
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state.
We propose SORNet, which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest.
arXiv Detail & Related papers (2021-09-08T19:36:29Z) - Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments.
We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects.
Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and
Reconstruction [57.1209039399599]
We propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein.
In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity.
We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation.
arXiv Detail & Related papers (2021-05-16T16:15:05Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z) - Object-Driven Active Mapping for More Accurate Object Pose Estimation
and Robotic Grasping [5.385583891213281]
The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process.
By combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated.
arXiv Detail & Related papers (2020-12-03T09:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.