Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
- URL: http://arxiv.org/abs/2412.17808v2
- Date: Tue, 24 Dec 2024 11:02:29 GMT
- Title: Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
- Authors: Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan,
- Abstract summary: We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism.
To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges.
- Score: 87.17440422575721
- License:
- Abstract: Recent 3D content generation pipelines commonly employ Variational Autoencoders (VAEs) to encode shapes into compact latent representations for diffusion-based generation. However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks. We present Dora-VAE, a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism. By identifying and prioritizing regions with high geometric complexity during training, our method significantly improves the preservation of fine-grained shape features. Such sampling strategy and the dual attention mechanism enable the VAE to focus on crucial geometric details that are typically missed by uniform sampling approaches. To systematically evaluate VAE reconstruction quality, we additionally propose Dora-bench, a benchmark that quantifies shape complexity through the density of sharp edges, introducing a new metric focused on reconstruction accuracy at these salient geometric features. Extensive experiments on the Dora-bench demonstrate that Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8$\times$ smaller (1,280 vs. > 10,000 codes). We will release our code and benchmark dataset to facilitate future research in 3D shape modeling.
Related papers
- CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting [5.8678184183132265]
CDGS is a confidence-aware depth regularization approach developed to enhance 3DGS.
We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision.
Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy.
arXiv Detail & Related papers (2025-02-20T16:12:13Z) - GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction [79.42244344704154]
GausSurf employs geometry guidance from multi-view consistency in texture-rich areas and normal priors in texture-less areas of a scene.
Our method surpasses state-of-the-art methods in terms of reconstruction quality and computation time.
arXiv Detail & Related papers (2024-11-29T03:54:54Z) - AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction [55.69271635843385]
We present AniSDF, a novel approach that learns fused-granularity neural surfaces with physics-based encoding for high-fidelity 3D reconstruction.
Our method boosts the quality of SDF-based methods by a great scale in both geometry reconstruction and novel-view synthesis.
arXiv Detail & Related papers (2024-10-02T03:10:38Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical
Refinement and EM optimization [6.886220026399106]
We introduce Multi-View Stereo (SD-MVS) to tackle challenges in 3D reconstruction of textureless areas.
We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes.
We propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths.
arXiv Detail & Related papers (2024-01-12T05:25:57Z) - Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors [79.80916315953374]
We propose SSP3D, a semi-supervised framework for 3D reconstruction.
We introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
Our approach also performs well when transferring to real-world Pix3D datasets under labeling ratios of 10%.
arXiv Detail & Related papers (2022-09-30T11:19:25Z) - A Geometric Perspective on Variational Autoencoders [0.0]
This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view.
We show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets.
arXiv Detail & Related papers (2022-09-15T15:32:43Z) - GLASS: Geometric Latent Augmentation for Shape Spaces [28.533018136138825]
We use geometrically motivated energies to augment and thus boost a sparse collection of example (training) models.
We analyze the Hessian of the as-rigid-as-possible (ARAP) energy to sample from and project to the underlying (local) shape space.
We present multiple examples of interesting and meaningful shape variations even when starting from as few as 3-10 training shapes.
arXiv Detail & Related papers (2021-08-06T17:56:23Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - High-Resolution Augmentation for Automatic Template-Based Matching of
Human Models [13.45311874573311]
We propose a new approach for 3D shape matching of deformable human shapes.
Our approach is based on the joint adoption of three different tools: an intrinsic spectral matching pipeline, a morphable model, and an extrinsic details refinement.
arXiv Detail & Related papers (2020-09-19T22:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.