Related papers: SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark

SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark

URL: http://arxiv.org/abs/2410.22715v1
Date: Wed, 30 Oct 2024 05:53:07 GMT
Title: SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark
Authors: HyunJun Jung, Weihang Li, Shun-Cheng Wu, William Bittner, Nikolas Brasch, Jifei Song, Eduardo Pérez-Pellitero, Zhensong Zhang, Arthur Moreau, Nassir Navab, Benjamin Busam,
Abstract summary: SCRREAM allows annotation of fully dense meshes of objects in the scene and registers camera poses on the real image sequence. We show the details of the dataset annotation pipeline and showcase four possible variants of datasets. Recent pipelines for indoor reconstruction and SLAM serve as new benchmarks.
Score: 43.88114765730359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditionally, 3d indoor datasets have generally prioritized scale over ground-truth accuracy in order to obtain improved generalization. However, using these datasets to evaluate dense geometry tasks, such as depth rendering, can be problematic as the meshes of the dataset are often incomplete and may produce wrong ground truth to evaluate the details. In this paper, we propose SCRREAM, a dataset annotation framework that allows annotation of fully dense meshes of objects in the scene and registers camera poses on the real image sequence, which can produce accurate ground truth for both sparse 3D as well as dense 3D tasks. We show the details of the dataset annotation pipeline and showcase four possible variants of datasets that can be obtained from our framework with example scenes, such as indoor reconstruction and SLAM, scene editing & object removal, human reconstruction and 6d pose estimation. Recent pipelines for indoor reconstruction and SLAM serve as new benchmarks. In contrast to previous indoor dataset, our design allows to evaluate dense geometry tasks on eleven sample scenes against accurately rendered ground truth depth maps.

Related papers

RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion [49.933001840775816]
RaySt3R recasts 3D shape completion as a novel view synthesis problem.<n>We train a feedforward transformer to predict depth maps, object masks, and per-pixel confidence scores for query rays.<n>RaySt3R fuses these predictions across multiple query views to reconstruct complete 3D shapes.
arXiv Detail & Related papers (2025-06-05T17:43:23Z)
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation [2.446672595462589]
We introduce the BelHouse3D dataset, a new synthetic point cloud dataset designed for 3D indoor scene semantic segmentation. This dataset is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions.
arXiv Detail & Related papers (2024-11-20T12:09:43Z)
SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset [25.962746964527224]
Reconstructing accurate 3D surfaces for street-view scenarios is crucial for applications such as digital entertainment and autonomous driving. We introduce the SS3DM dataset, comprising precise textbfSynthetic textbfStreet-view textbf3D textbfMesh models exported from the CARLA simulator. To simulate the input data in realistic driving scenarios for 3D reconstruction, we virtually drive a vehicle equipped with six RGB cameras and five LiDAR sensors in diverse outdoor scenes.
arXiv Detail & Related papers (2024-10-29T04:54:45Z)
Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data [0.0]
This paper presents a real-time pipeline for localizing building components, including wall and ground surfaces, by integrating geometric calculations for pure 3D plane detection. It has a parallel multi-thread architecture to precisely estimate poses and equations of all the planes detected in the environment, filters the ones forming the map structure using a panoptic segmentation validation, and keeps only the validated building components. It can also ensure (re-)association of these detected components into a unified 3D scene graph, bridging the gap between geometric accuracy and semantic understanding.
arXiv Detail & Related papers (2024-09-10T16:28:09Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework. To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method. We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z)
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes [79.00228778543553]
This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels. We present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion.
arXiv Detail & Related papers (2023-06-29T13:09:31Z)
HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction [37.29140654256627]
We present a photo-realistic object-centric dataset HM3D-ABO. It is constructed by composing realistic indoor scene and realistic object. The dataset could also be useful for tasks such as camera pose estimation and novel-view synthesis.
arXiv Detail & Related papers (2022-06-24T16:02:01Z)
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
Towards General Purpose Geometry-Preserving Single-View Depth Estimation [1.9573380763700712]
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics. Recent works have shown that a successful solution strongly relies on the diversity and volume of training data. Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
arXiv Detail & Related papers (2020-09-25T20:06:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.