Related papers: GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction

GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction

URL: http://arxiv.org/abs/2511.12204v3
Date: Wed, 19 Nov 2025 08:17:25 GMT
Title: GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction
Authors: Jiaqi Wu, Yaosen Chen, Shuyuan Zhu,
Abstract summary: Multi-view image generation holds significant application value in computer vision.<n>Existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency.<n>We propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information.
Score: 15.701540201818192
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images. More details can be found on the project page: https://sobeymil.github.io/GeoMVD.com.

Related papers

Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution [22.312869477454864]
We propose Geo-coder -- the first inverse programming framework for geometric images based on a multi-agent system.<n>Our method innovatively decouples the process into geometric modeling via pixel-wise anchoring and metric-driven code evolution.<n>Experiments demonstrate that Geo-coder achieves a substantial lead in both geometric reconstruction accuracy and visual consistency.
arXiv Detail & Related papers (2026-02-08T00:48:49Z)
GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation [68.02988074681427]
Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content.<n>In this paper, we renovate the pipeline of image-to-3D scene generation by unlocking the potential of geometry models.<n>Our GeoWorld can generate high-fidelity 3D scenes from a single image and a given camera trajectory, outperforming prior methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2025-11-28T13:55:45Z)
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z)
Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis [12.160537328404622]
textttDRA-Ctrl provides new insights into reusing resource-intensive video models.<n>textttDRA-Ctrl lays foundation for future unified generative models across visual modalities.
arXiv Detail & Related papers (2025-05-29T10:34:45Z)
Multi-view dense image matching with similarity learning and geometry priors [0.0]
MV-DeepSimNets is a suite of deep neural networks designed for multi-view similarity learning.<n>Our approach incorporates an online geometry prior to characterize pixel relationships.<n>Our method geometric preconditioning effectively adapts epipolar-based features for enhanced multi-view reconstruction.
arXiv Detail & Related papers (2025-05-16T13:55:40Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers [8.781861951759948]
This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations. We develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations.
arXiv Detail & Related papers (2022-10-25T01:55:17Z)
Learning Formation of Physically-Based Face Attributes [16.55993873730069]
Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets.
arXiv Detail & Related papers (2020-04-02T07:01:30Z)
Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image. We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image. Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.