Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2506.03538v1
- Date: Wed, 04 Jun 2025 03:40:33 GMT
- Title: Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting
- Authors: Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, Xiangyu Xu,
- Abstract summary: 3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors.<n>Existing methods typically rely on strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions.<n>We propose Asymmetric Dual 3DGS, a novel framework that leverages the convergence nature of these artifacts.
- Score: 30.566082713819117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose Asymmetric Dual 3DGS, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency. Codes and trained models will be released.
Related papers
- RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS [79.15416002879239]
3D Gaussian Splatting has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling.<n>Existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images.<n>We propose RobustSplat, a robust solution based on two critical designs.
arXiv Detail & Related papers (2025-06-03T11:13:48Z) - Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning [27.4552892119823]
inconsistencies in multi-view snapshots frequently introduce noise and artifacts along object boundaries, undermining the 3D reconstruction process.<n>We leverage 3D Gaussian Splatting (3DGS) for 3D reconstruction, and explicitly integrate uncertainty-aware learning into the reconstruction process.<n>We apply adaptive pixel-wise loss weighting to regularize the models, reducing reconstruction intensity in high-uncertainty regions.
arXiv Detail & Related papers (2024-11-28T02:19:28Z) - DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.
It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z) - PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting [16.333566122541022]
We propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks.<n>Our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.
arXiv Detail & Related papers (2024-10-23T02:05:05Z) - LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes [39.687526103092445]
We introduce a novel method to uplift 2D image features into Gaussian Splatting representations of 3D scenes.<n>We obtain competitive segmentation results using generic DINOv2 features, despite DINOv2 not being trained on millions of annotated segmentation masks like SAM.<n>When applied to CLIP features, our method demonstrates strong performance in open-vocabulary object localization tasks.
arXiv Detail & Related papers (2024-10-18T13:44:29Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.