Related papers: SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion

SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion

URL: http://arxiv.org/abs/2505.16535v1
Date: Thu, 22 May 2025 11:25:38 GMT
Title: SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion
Authors: Asrar Alruwayqi,
Abstract summary: We present a novel framework for dynamic 3D scene reconstruction that integrates three key components.<n>An explicit tri-plane deformation field, a view-conditioned canonical field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior.<n>Our method encodes 4D scenes using three 2D feature planes that evolve over time, enabling efficient compact representation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel framework for dynamic 3D scene reconstruction that integrates three key components: an explicit tri-plane deformation field, a view-conditioned canonical radiance field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior. Our method encodes 4D scenes using three orthogonal 2D feature planes that evolve over time, enabling efficient and compact spatiotemporal representation. These features are explicitly warped into a canonical space via a deformation offset field, eliminating the need for MLP-based motion modeling. In canonical space, we replace traditional MLP decoders with a structured SH-based rendering head that synthesizes view-dependent color via attention over learned frequency bands improving both interpretability and rendering efficiency. To further enhance fidelity and temporal consistency, we introduce a transformer-guided latent diffusion module that refines the tri-plane and deformation features in a compressed latent space. This generative module denoises scene representations under ambiguous or out-of-distribution (OOD) motion, improving generalization. Our model is trained in two stages: the diffusion module is first pre-trained independently, and then fine-tuned jointly with the full pipeline using a combination of image reconstruction, diffusion denoising, and temporal consistency losses. We demonstrate state-of-the-art results on synthetic benchmarks, surpassing recent methods such as HexPlane and 4D Gaussian Splatting in visual quality, temporal coherence, and robustness to sparse-view dynamic inputs.

Related papers

Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction [9.911802466255653]
We propose a novel dynamic 3DGS framework with hybrid explicit-implicit functions.<n>Our method demonstrates state-of-the-art performance in reconstructing complex dynamic scenes, achieving better reconstruction fidelity.
arXiv Detail & Related papers (2025-08-07T01:39:29Z)
STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering [15.873329633980015]
Existing 3DGS-based methods for dynamic reconstruction often suffer from textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering)<n>We propose textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering), a plug-and-play module learns thattemporal probability distributions for each scene.
arXiv Detail & Related papers (2025-05-28T14:26:41Z)
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling [34.238349310770886]
We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE.<n>Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry.
arXiv Detail & Related papers (2025-05-20T15:44:54Z)
LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations [21.183524347952762]
We propose a new view synthesis method via a 3D neural field from both single or few-view input images.<n>Our reconstruction model first lifts one or more input images to the 3D space from a volume as the coarse-scale 3D representation.<n>Our diffusion model then hallucinates missing details in the rendered images from tri-planes.
arXiv Detail & Related papers (2024-12-19T02:23:55Z)
Event-boosted Deformable 3D Gaussians for Dynamic Scene Reconstruction [50.873820265165975]
We introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for dynamic scene reconstruction.<n>We propose a GS-Threshold Joint Modeling strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling.<n>We contribute the first event-inclusive 4D benchmark with synthetic and real-world dynamic scenes, on which our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-25T08:23:38Z)
VortSDF: 3D Modeling with Centroidal Voronoi Tesselation on Signed Distance Field [5.573454319150408]
We introduce a volumetric optimization framework that combines explicit SDF fields with a shallow color network, in order to estimate 3D shape properties over tetrahedral grids. Experimental results with Chamfer statistics validate this approach with unprecedented reconstruction quality on various scenarios such as objects, open scenes or human.
arXiv Detail & Related papers (2024-07-29T09:46:39Z)
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs. We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space. We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z)
Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z)
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking [52.393359791978035]
Motion2VecSets is a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. We parameterize 4D dynamics with latent sets instead of using global latent codes. For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames.
arXiv Detail & Related papers (2024-01-12T15:05:08Z)
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances. First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z)
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping. Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.