Related papers: Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

URL: http://arxiv.org/abs/2103.16095v1
Date: Tue, 30 Mar 2021 05:56:58 GMT
Title: Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments
Authors: Muzhi Han, Zeyu Zhang, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu
Abstract summary: We rethink the problem of scene reconstruction from an embodied agent's perspective. We reconstruct an interactive scene using RGB-D data stream. This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
Score: 81.38641691636847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents. Here, we address this challenging problem by reconstructing an interactive scene using RGB-D data stream, which captures (i) the semantics and geometry of objects and layouts by a 3D volumetric panoptic mapping module, and (ii) object affordance and contextual relations by reasoning over physical common sense among objects, organized by a graph-based scene representation. Crucially, this reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models for finer-grained robot interactions. In the experiments, we demonstrate that (i) our panoptic mapping module outperforms previous state-of-the-art methods, (ii) a high-performant physical reasoning procedure that matches, aligns, and replaces objects' meshes with best-fitted CAD models, and (iii) reconstructed scenes are physically plausible and naturally afford actionable interactions; without any manual labeling, they are seamlessly imported to ROS-based simulators and virtual environments for complex robot task executions.

Related papers

HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation. We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image [44.8172828045897]
Current methods often struggle with domain-specific limitations or low-quality object generation. We propose CAST, a novel method for 3D scene reconstruction and recovery.
arXiv Detail & Related papers (2025-02-18T14:29:52Z)
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling [27.577720075952225]
We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each.
arXiv Detail & Related papers (2024-11-29T06:19:04Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References [49.71130133080821]
MaRINeR is a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint. We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations.
arXiv Detail & Related papers (2024-07-18T17:50:03Z)
Simultaneous Map and Object Reconstruction [66.66729715211642]
We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR. We take inspiration from recent novel view synthesis methods and pose the reconstruction problem as a global optimization. By careful modeling of continuous-time motion, our reconstructions can compensate for the rolling shutter effects of rotating LiDAR sensors.
arXiv Detail & Related papers (2024-06-19T23:53:31Z)
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z)
Interaction-Driven Active 3D Reconstruction with Object Interiors [17.48872400701787]
We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors.
arXiv Detail & Related papers (2023-10-23T08:44:38Z)
Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images. This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories. We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning [9.150245363036165]
Dynamic objects have a significant impact on the robot's perception of the environment. In this work, we address this problem by synthesizing plausible color, texture and geometry in regions occluded by dynamic objects. We optimize our architecture using adversarial training to synthesize fine realistic textures which enables it to hallucinate color and depth structure in occluded regions online.
arXiv Detail & Related papers (2020-08-12T01:23:21Z)
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans [24.06640371472068]
We present a novel approach to reconstructing lightweight, CAD-based representations of scanned 3D environments from commodity RGB-D sensors. Our key idea is to jointly optimize for both CAD model alignments as well as layout estimations of the scanned scene.
arXiv Detail & Related papers (2020-03-27T20:17:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.