Flash Sculptor: Modular 3D Worlds from Objects
- URL: http://arxiv.org/abs/2504.06178v1
- Date: Tue, 08 Apr 2025 16:20:51 GMT
- Title: Flash Sculptor: Modular 3D Worlds from Objects
- Authors: Yujia Hu, Songhua Liu, Xingyi Yang, Xinchao Wang,
- Abstract summary: Flash Sculptor is a simple yet effective framework for compositional 3D scene/object reconstruction from a single image.<n>For rotation, we introduce a coarse-to-fine scheme that brings the best of both worlds--efficiency and accuracy--while for translation, we develop an outlier-removal-based algorithm.<n>Experiments demonstrate that Flash Sculptor achieves at least a 3 times speedup over existing compositional 3D methods.
- Score: 73.63179709035595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing text-to-3D and image-to-3D models often struggle with complex scenes involving multiple objects and intricate interactions. Although some recent attempts have explored such compositional scenarios, they still require an extensive process of optimizing the entire layout, which is highly cumbersome if not infeasible at all. To overcome these challenges, we propose Flash Sculptor in this paper, a simple yet effective framework for compositional 3D scene/object reconstruction from a single image. At the heart of Flash Sculptor lies a divide-and-conquer strategy, which decouples compositional scene reconstruction into a sequence of sub-tasks, including handling the appearance, rotation, scale, and translation of each individual instance. Specifically, for rotation, we introduce a coarse-to-fine scheme that brings the best of both worlds--efficiency and accuracy--while for translation, we develop an outlier-removal-based algorithm that ensures robust and precise parameters in a single step, without any iterative optimization. Extensive experiments demonstrate that Flash Sculptor achieves at least a 3 times speedup over existing compositional 3D methods, while setting new benchmarks in compositional 3D reconstruction performance. Codes are available at https://github.com/YujiaHu1109/Flash-Sculptor.
Related papers
- HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - Enhancing Monocular 3D Scene Completion with Diffusion Model [20.81599069390756]
3D scene reconstruction is essential for applications in virtual reality, robotics, and autonomous driving.<n>Traditional 3D Gaussian Splatting techniques rely on images captured from multiple viewpoints to achieve optimal performance.<n>We introduce FlashDreamer, a novel approach for reconstructing a complete 3D scene from a single image.
arXiv Detail & Related papers (2025-03-02T04:36:57Z) - MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion [12.602510002753815]
We build upon a recently released foundation model for 3D vision that can robustly produce local 3D reconstructions and accurate matches.
We introduce a low-memory approach to accurately align these local reconstructions in a global coordinate system.
Our novel SfM pipeline is simple, scalable, fast and truly unconstrained, i.e. it can handle any collection of images, ordered or not.
arXiv Detail & Related papers (2024-09-27T21:29:58Z) - REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment [23.733856513456]
We present REPARO, a novel approach for compositional 3D asset generation from single images.
REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models.
It then optimize the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition.
arXiv Detail & Related papers (2024-05-28T18:45:10Z) - Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View [5.222115919729418]
Single-view 3D reconstruction is currently approached from two dominant perspectives.<n>We propose a hybrid method following a divide-and-conquer strategy.<n>We demonstrate the reconstruction performance of our approach on both synthetic and real-world scenes.
arXiv Detail & Related papers (2024-04-04T12:58:46Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Iterative Superquadric Recomposition of 3D Objects from Multiple Views [77.53142165205283]
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views.
Our framework iteratively adds new superquadrics wherever the reconstruction error is high.
It provides consistently more accurate 3D reconstructions, even from images in the wild.
arXiv Detail & Related papers (2023-09-05T10:21:37Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.