Masks make discriminative models great again!
- URL: http://arxiv.org/abs/2507.00916v1
- Date: Tue, 01 Jul 2025 16:22:23 GMT
- Title: Masks make discriminative models great again!
- Authors: Tianshi Cao, Marie-Julie Rakotosaona, Ben Poole, Federico Tombari, Michael Niemeyer,
- Abstract summary: Image2GS is a novel approach to reconstructing 3D scenes from a single image.<n>We create a more deterministic task suitable for discriminative models.<n>Our findings highlight the fundamental struggle discriminative models face when fitting unseen regions.
- Score: 63.877419140833844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Image2GS, a novel approach that addresses the challenging problem of reconstructing photorealistic 3D scenes from a single image by focusing specifically on the image-to-3D lifting component of the reconstruction process. By decoupling the lifting problem (converting an image to a 3D model representing what is visible) from the completion problem (hallucinating content not present in the input), we create a more deterministic task suitable for discriminative models. Our method employs visibility masks derived from optimized 3D Gaussian splats to exclude areas not visible from the source view during training. This masked training strategy significantly improves reconstruction quality in visible regions compared to strong baselines. Notably, despite being trained only on masked regions, Image2GS remains competitive with state-of-the-art discriminative models trained on full target images when evaluated on complete scenes. Our findings highlight the fundamental struggle discriminative models face when fitting unseen regions and demonstrate the advantages of addressing image-to-3D lifting as a distinct problem with specialized techniques.
Related papers
- IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement [15.206470606085341]
We introduce a novel approach that produces inpainted 3D scenes with consistent visual quality and coherent underlying geometry.<n>Specifically, we propose a robust 3D inpainting pipeline that incorporates geometric priors and a multi-view refinement network trained via test-time adaptation.<n>We develop a novel inpainting mask detection technique to derive targeted inpainting masks from object masks, boosting the performance in handling unconstrained scenes.
arXiv Detail & Related papers (2025-03-06T14:50:17Z) - SMIRK: 3D Facial Expressions through Analysis-by-Neural-Synthesis [30.2749903946587]
SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics) faithfully reconstructs expressive 3D faces from images.<n>We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images.<n>Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction.
arXiv Detail & Related papers (2024-04-05T14:00:07Z) - MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with
Informative-Preserved Reconstruction and Self-Distilled Consistency [120.9499803967496]
We propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points.
Our method can concentrate on modeling regional geometry and enjoy less ambiguity for masked reconstruction.
By combining informative-preserved reconstruction on masked areas and consistency self-distillation from unmasked areas, a unified framework called MM-3DScene is yielded.
arXiv Detail & Related papers (2022-12-20T01:53:40Z) - Non-Deterministic Face Mask Removal Based On 3D Priors [3.8502825594372703]
The proposed approach integrates a multi-task 3D face reconstruction module with a face inpainting module.
By gradually controlling the 3D shape parameters, our method generates high-quality dynamic inpainting results with different expressions and mouth movements.
arXiv Detail & Related papers (2022-02-20T16:27:44Z) - AvatarMe++: Facial Shape and BRDF Inference with Photorealistic
Rendering-Aware GANs [119.23922747230193]
We introduce the first method that is able to reconstruct render-ready 3D facial geometry and BRDF from a single "in-the-wild" image.
Our method outperforms the existing arts by a significant margin and reconstructs high-resolution 3D faces from a single low-resolution image.
arXiv Detail & Related papers (2021-12-11T11:36:30Z) - Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo
Collection [65.92058628082322]
Non-parametric face modeling aims to reconstruct 3D face only from images without shape assumptions.
This paper presents a novel Learning to Aggregate and Personalize framework for unsupervised robust 3D face modeling.
arXiv Detail & Related papers (2021-06-15T03:10:17Z) - OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion.
The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator.
We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z) - Learning Pose-invariant 3D Object Reconstruction from Single-view Images [61.98279201609436]
In this paper, we explore a more realistic setup of learning 3D shape from only single-view images.
The major difficulty lies in insufficient constraints that can be provided by single view images.
We propose an effective adversarial domain confusion method to learn pose-disentangled compact shape space.
arXiv Detail & Related papers (2020-04-03T02:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.