SC-Diff: 3D Shape Completion with Latent Diffusion Models
- URL: http://arxiv.org/abs/2403.12470v2
- Date: Mon, 01 Sep 2025 10:37:07 GMT
- Title: SC-Diff: 3D Shape Completion with Latent Diffusion Models
- Authors: Simon Schaefer, Juan D. Galvis, Xingxing Zuo, Stefan Leutengger,
- Abstract summary: We present a novel 3D shape completion framework that unifies multimodal conditioning.<n>Shapes are represented as Truncated Signed Distance Functions (TSDFs) and encoded into a discrete latent space jointly supervised by 2D and 3D cues.<n>Our approach guides the generation process with flexible multimodal conditioning, ensuring consistent integration of 2D and 3D information.
- Score: 4.261508855254493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel 3D shape completion framework that unifies multimodal conditioning, leveraging both 2D images and 3D partial scans through a latent diffusion model. Shapes are represented as Truncated Signed Distance Functions (TSDFs) and encoded into a discrete latent space jointly supervised by 2D and 3D cues, enabling efficient high-resolution processing while reducing GPU memory usage by 30\% compared to state-of-the-art methods. Our approach guides the generation process with flexible multimodal conditioning, ensuring consistent integration of 2D and 3D information from encoding to reconstruction. Our training strategy simulates realistic partial observations, avoiding assumptions about input structure and improving robustness in real-world scenarios. Leveraging our efficient latent space and multimodal conditioning, our model generalizes across object categories, outperforming class-specific models by 12\% and class-agnostic models by 47\% in $l_1$ reconstruction error, while producing more diverse, realistic, and high-fidelity completions than prior approaches.
Related papers
- Repurposing 2D Diffusion Models for 3D Shape Completion [14.959136858291904]
We present a framework that adapts 2D diffusion models for 3D shape completion from incomplete point clouds.<n>We introduce the Shape Atlas, a compact 2D representation of 3D geometry.<n>We validate the effectiveness of our results on the PCN and ShapeNet-55 datasets.
arXiv Detail & Related papers (2025-12-16T00:59:05Z) - PointDico: Contrastive 3D Representation Learning Guided by Diffusion Models [5.077352707415241]
textitPointDico learns from both denoising generative modeling and cross-modal contrastive learning through knowledge distillation.<n>textitPointDico achieves a new state-of-the-art in 3D representation learning, textite.g., textbf94.32% accuracy on ScanObjectNN, textbf86.5% Inst. mIoU on ShapeNetPart.
arXiv Detail & Related papers (2025-12-09T07:57:56Z) - TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP [52.79100775328595]
3D visual grounding allows an embodied agent to understand visual information in real-world 3D environments based on human instructions.<n>Existing 3D visual grounding methods rely on separate encoders for different modalities.<n>We propose a unified 2D pre-trained multi-modal network to process all three modalities.
arXiv Detail & Related papers (2025-07-20T10:28:06Z) - BridgeShape: Latent Diffusion Schrödinger Bridge for 3D Shape Completion [20.704173763035488]
BridgeShape is a novel framework for 3D shape completion via latent diffusion Schr"odinger bridge.<n>We introduce a Depth-Enhanced Vector Quantized Variational Autoencoder (VQ-VAE) to encode 3D shapes into a compact latent space.<n>BridgeShape achieves state-of-the-art performance on large-scale 3D shape completion benchmarks.
arXiv Detail & Related papers (2025-06-29T12:21:21Z) - LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework [40.17218893870908]
LTM3D is a Latent Token space Modeling framework for conditional 3D shape generation.<n>It integrates the strengths of diffusion and auto-regressive (AR) models.<n>LTM3D offers a generalizable framework for multi-modal, multi-representation 3D generation.
arXiv Detail & Related papers (2025-05-30T06:08:45Z) - Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling [34.238349310770886]
We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE.<n>Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry.
arXiv Detail & Related papers (2025-05-20T15:44:54Z) - Introducing 3D Representation for Medical Image Volume-to-Volume Translation via Score Fusion [3.3559609260669303]
We present Score-Fusion, a novel volumetric translation model that effectively learns 3D representations by ensembling perpendicularly trained 2D diffusion models in score function space.
We show that Score-Fusion achieves superior accuracy and volumetric fidelity in 3D medical image super-resolution and modality translation.
arXiv Detail & Related papers (2025-01-13T15:54:21Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.<n>We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.<n>GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - From Diffusion to Resolution: Leveraging 2D Diffusion Models for 3D Super-Resolution Task [19.56372155146739]
We present a novel approach that leverages the 2D diffusion model and lateral continuity within the volume to enhance 3D volume electron microscopy (vEM) super-resolution.
Our results on two publicly available focused ion beam scanning electron microscopy (FIB-SEM) datasets demonstrate the robustness and practical applicability of our framework.
arXiv Detail & Related papers (2024-11-25T09:12:55Z) - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - Robust 3D Tracking with Quality-Aware Shape Completion [67.9748164949519]
We propose a synthetic target representation composed of dense and complete point clouds depicting the target shape precisely by shape completion for robust 3D tracking.
Specifically, we design a voxelized 3D tracking framework with shape completion, in which we propose a quality-aware shape completion mechanism to alleviate the adverse effect of noisy historical predictions.
arXiv Detail & Related papers (2023-12-17T04:50:24Z) - Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection [77.23918785277404]
We present Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection.
Specifically, we include noises to produce corrupted 3D object size and class label, distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs.
We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods.
arXiv Detail & Related papers (2023-12-05T18:54:03Z) - HoloFusion: Towards Photo-realistic 3D Generative Modeling [77.03830223281787]
Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation.
We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples.
arXiv Detail & Related papers (2023-08-28T01:19:33Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - DiffComplete: Diffusion-based Generative 3D Shape Completion [114.43353365917015]
We introduce a new diffusion-based approach for shape completion on 3D range scans.
We strike a balance between realism, multi-modality, and high fidelity.
DiffComplete sets a new SOTA performance on two large-scale 3D shape completion benchmarks.
arXiv Detail & Related papers (2023-06-28T16:07:36Z) - Locally Attentional SDF Diffusion for Controllable 3D Shape Generation [24.83724829092307]
We propose a diffusion-based 3D generation framework, to model plausible 3D shapes, via 2D sketch image input.
Our method is built on a two-stage diffusion model. The first stage, named occupancy-diffusion, aims to generate a low-resolution occupancy field to approximate the shape shell.
The second stage, named SDF-diffusion, synthesizes a high-resolution signed distance field within the occupied voxels determined by the first stage to extract fine geometry.
arXiv Detail & Related papers (2023-05-08T05:07:23Z) - HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision.
We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - 3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields.
Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields.
We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.