GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion
- URL: http://arxiv.org/abs/2510.03110v1
- Date: Fri, 03 Oct 2025 15:38:12 GMT
- Title: GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion
- Authors: Beibei Lin, Tingting Chen, Robby T. Tan,
- Abstract summary: We propose a novel framework that incorporates explicit 3D structural guidance to enforce geometric consistency in completed regions.<n>Experiments show that GeoComplete achieves a 17.1 PSNR improvement over state-of-the-art methods.
- Score: 36.02469602451232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reference-driven image completion, which restores missing regions in a target view using additional images, is particularly challenging when the target view differs significantly from the references. Existing generative methods rely solely on diffusion priors and, without geometric cues such as camera pose or depth, often produce misaligned or implausible content. We propose GeoComplete, a novel framework that incorporates explicit 3D structural guidance to enforce geometric consistency in the completed regions, setting it apart from prior image-only approaches. GeoComplete introduces two key ideas: conditioning the diffusion process on projected point clouds to infuse geometric information, and applying target-aware masking to guide the model toward relevant reference cues. The framework features a dual-branch diffusion architecture. One branch synthesizes the missing regions from the masked target, while the other extracts geometric features from the projected point cloud. Joint self-attention across branches ensures coherent and accurate completion. To address regions visible in references but absent in the target, we project the target view into each reference to detect occluded areas, which are then masked during training. This target-aware masking directs the model to focus on useful cues, enhancing performance in difficult scenarios. By integrating a geometry-aware dual-branch diffusion architecture with a target-aware masking strategy, GeoComplete offers a unified and robust solution for geometry-conditioned image completion. Experiments show that GeoComplete achieves a 17.1 PSNR improvement over state-of-the-art methods, significantly boosting geometric accuracy while maintaining high visual quality.
Related papers
- GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving [55.14836667214487]
GeoFocus is a novel framework comprising two core modules.<n>GeoFocus achieves a 4.7% accuracy improvement over leading specialized models.<n>It demonstrates superior robustness in MATHVERSE under diverse visual conditions.
arXiv Detail & Related papers (2026-02-09T11:15:01Z) - Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution [22.312869477454864]
We propose Geo-coder -- the first inverse programming framework for geometric images based on a multi-agent system.<n>Our method innovatively decouples the process into geometric modeling via pixel-wise anchoring and metric-driven code evolution.<n>Experiments demonstrate that Geo-coder achieves a substantial lead in both geometric reconstruction accuracy and visual consistency.
arXiv Detail & Related papers (2026-02-08T00:48:49Z) - Thinking with Geometry: Active Geometry Integration for Spatial Reasoning [68.59084007360615]
We propose GeoThinker, a framework that shifts paradigm passive fusion to active perception.<n>Instead of feature mixing, GeoThinker enables the model to selectively retrieve geometric evidence conditioned on its internal reasoning demands.<n>Our results indicate that the ability to actively integrate spatial structures is essential for next-generation spatial intelligence.
arXiv Detail & Related papers (2026-02-05T18:59:32Z) - Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior [53.762256749551284]
We identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction.<n>We incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models.<n>Our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2025-10-14T03:06:28Z) - JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization [26.250213248316342]
Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation.<n>Existing methods predominantly rely on semantic features from RGB images.<n>We introduce a Joint perception network to integrate RGB and Normal images.
arXiv Detail & Related papers (2025-09-06T12:11:51Z) - Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z) - Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z) - Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.<n>It decouples the high-quality depth map reconstruction process by handling global and detailed geometric features separately.<n>Our solution significantly outperforms current methods in terms of geometric consistency and detail recovery.
arXiv Detail & Related papers (2024-11-05T16:37:30Z) - Revisiting Near/Remote Sensing with Geospatial Attention [24.565068569913382]
This work addresses the task of overhead image segmentation when auxiliary ground-level images are available.
Recent work has shown that performing joint inference over these two modalities, often called near/remote sensing, can yield significant accuracy improvements.
We introduce the concept of geospatial attention, a geometry-aware attention mechanism that explicitly considers the geospatial relationship between the pixels in a ground-level image and a geographic location.
arXiv Detail & Related papers (2022-04-04T19:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.