3DFusion, A real-time 3D object reconstruction pipeline based on
streamed instance segmented data
- URL: http://arxiv.org/abs/2311.06659v1
- Date: Sat, 11 Nov 2023 20:11:58 GMT
- Title: 3DFusion, A real-time 3D object reconstruction pipeline based on
streamed instance segmented data
- Authors: Xi Sun, Derek Jacoby, Yvonne Coady
- Abstract summary: This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images.
The system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background.
The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more.
- Score: 0.552480439325792
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a real-time segmentation and reconstruction system that
utilizes RGB-D images to generate accurate and detailed individual 3D models of
objects within a captured scene. Leveraging state-of-the-art instance
segmentation techniques, the system performs pixel-level segmentation on RGB-D
data, effectively separating foreground objects from the background. The
segmented objects are then reconstructed into distinct 3D models in a
high-performance computation platform. The real-time 3D modelling can be
applied across various domains, including augmented/virtual reality, interior
design, urban planning, road assistance, security systems, and more. To achieve
real-time performance, the paper proposes a method that effectively samples
consecutive frames to reduce network load while ensuring reconstruction
quality. Additionally, a multi-process SLAM pipeline is adopted for parallel 3D
reconstruction, enabling efficient cutting of the clustering objects into
individuals. This system employs the industry-leading framework YOLO for
instance segmentation. To improve YOLO's performance and accuracy,
modifications were made to resolve duplicated or false detection of similar
objects, ensuring the reconstructed models align with the targets. Overall,
this work establishes a robust real-time system with a significant enhancement
for object segmentation and reconstruction in the indoor environment. It can
potentially be extended to the outdoor scenario, opening up numerous
opportunities for real-world applications.
Related papers
- ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction [4.989480853499916]
We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine.
Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system.
arXiv Detail & Related papers (2020-04-09T11:09:56Z) - OccuSeg: Occupancy-aware 3D Instance Segmentation [39.71517989569514]
"3D occupancy size" is the number of voxels occupied by each instance.
"OccuSeg" is an occupancy-aware 3D instance segmentation scheme.
"State-of-the-art performance" on 3 real-world datasets.
arXiv Detail & Related papers (2020-03-14T02:48:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.