FuguReport

3D-ReGen: A Unified 3D Geometry Regeneration Framework

Authors Geon Yeong Park, Roman Shapovalov, Rakesh Ranjan, Jong Chul Ye, Andrea Vedaldi, Thu Nguyen-Phuoc
Affiliations Meta / Korea Advanced Institute of Science and Technology
Categories Method / 3D Reconstruction / Unified 3D geometry regeneration framework, Evaluation / Geometry Quality Assessment / Evaluation of geometric consistency and fine quality, Application / 3D Content Creation / Support for 3D expansion, reconstruction, and editing
License CC BY 4.0

Abstract Overview

3D-ReGen is a unified diffusion-based framework that formulates multiple 3D tasks—enhancement, reconstruction, and editing—as a single regeneration problem: predicting a high-information 3D shape from an initial low-information 3D shape with optional image guidance. The framework builds on the VecSet latent representation and a Diffusion Transformer (DiT), conditioning on the input geometry by encoding it in the same VecSet latent space and concatenating those tokens with the noisy target tokens via a zero-initialized MLP and positional embeddings. The authors propose automatic data-construction pipelines that convert a large unpaired collection of approximately 1 million 3D objects into training triplets for each task without extra annotations. Experiments cover compositional 3D enhancement, guided image-to-3D reconstruction from sparse views, and 3D shape editing, with evaluations on geometric consistency and perceptual quality metrics.

Novelty

The paper's main novelty is a unified 3D regeneration formulation that handles enhancement, reconstruction, and editing within one architecture rather than task-specific models. A second distinctive element is the conditioning design: both the coarse input shape and the target shape are represented in the same VecSet latent space and fused by token concatenation (with zero-initialized MLP preprocessing and positional embeddings), which ablations show outperforms cross-attention alternatives inspired by CLAY and Hunyuan3D-Omni. Additionally, the self-supervised data-generation protocols that create degraded/high-quality training pairs from a generic 3D dataset without task-specific annotations represent a practical contribution.

Results

In compositional enhancement on 623 decomposed objects from 21 scenes, 3D-ReGen raises ULIP-3D from 0.2280 (input) and 0.2294 (DetailGen3D) to 0.2626, and MV-ImageReward from 0.1716 and 0.1003 to 0.3394. In guided reconstruction on GSO, VGGT+3D-ReGen substantially improves over raw VGGT outputs and, with four input views, achieves CD 0.0081, F-score 0.4913, IoU 0.7574, PSNR 24.2754, SSIM 0.9408, and LPIPS 0.0873, matching or outperforming specialized multi-view baselines with the same or fewer views. Ablation studies confirm that the proposed token-concatenation conditioning mechanism outperforms cross-attention and additive alternatives, and that v-prediction diffusion parameterization outperforms a rectified flow baseline.

Key Points

  1. 3D-ReGen unifies multiple controllable 3D tasks (enhancement, sparse-view reconstruction, editing) by regenerating a detailed shape from coarse, degraded, incomplete, or masked geometry with optional image guidance, using a single architecture based on VecSet latents and a Diffusion Transformer.
  2. The conditioning mechanism encodes both input and target geometry in the same VecSet latent space and concatenates them as joint DiT input with zero-initialized MLP preprocessing; ablations show this outperforms cross-attention alternatives inspired by CLAY and Hunyuan3D-Omni, particularly for dense or noisy 3D conditions.
  3. Training relies on automatically generated degraded/high-quality pairs from a base dataset of approximately 1 million 3D objects, enabling task-specific data construction for enhancement, reconstruction, and editing without manual annotations; existing generation frameworks like TRELLIS are shown to fail on regeneration tasks without such dedicated fine-tuning data.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.