ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points
- URL: http://arxiv.org/abs/2512.07504v1
- Date: Mon, 08 Dec 2025 12:38:11 GMT
- Title: ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points
- Authors: Ryota Okumura, Kaede Shiohara, Toshihiko Yamasaki,
- Abstract summary: We propose ControlVP, a user-guided framework for correcting vanishing point inconsistencies in generated images.<n>Our approach extends a pre-trained diffusion model by incorporating structural guidance derived from building contours.<n>Our method enhances global geometric consistency while maintaining visual fidelity comparable to the baselines.
- Score: 32.23473666846317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-image models, such as Stable Diffusion, have achieved impressive visual quality, yet they often suffer from geometric inconsistencies that undermine the structural realism of generated scenes. One prominent issue is vanishing point inconsistency, where projections of parallel lines fail to converge correctly in 2D space. This leads to structurally implausible geometry that degrades spatial realism, especially in architectural scenes. We propose ControlVP, a user-guided framework for correcting vanishing point inconsistencies in generated images. Our approach extends a pre-trained diffusion model by incorporating structural guidance derived from building contours. We also introduce geometric constraints that explicitly encourage alignment between image edges and perspective cues. Our method enhances global geometric consistency while maintaining visual fidelity comparable to the baselines. This capability is particularly valuable for applications that require accurate spatial structure, such as image-to-3D reconstruction. The dataset and source code are available at https://github.com/RyotaOkumura/ControlVP .
Related papers
- Geometry-Aware Rotary Position Embedding for Consistent Video World Model [48.914346802616414]
ViewRope is a geometry-aware encoding that injects camera-ray directions directly into video transformer self-attention layers.<n>Geometry-Aware Frame-Sparse Attention exploits these geometric cues to selectively attend to relevant historical frames.<n>Our results demonstrate that ViewRope substantially improves long-term consistency while reducing computational costs.
arXiv Detail & Related papers (2026-02-08T08:01:16Z) - GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss [15.633839321933385]
Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise in camera pose estimation and 3D reconstruction.<n>These models typically rely on ground truth labels for training, posing challenges when adapting to unlabeled and unseen scenes.<n>We propose a self-supervised framework to train VGGT with unlabeled data, thereby enhancing its localization capability in large-scale environments.
arXiv Detail & Related papers (2026-01-23T16:46:59Z) - Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing [63.141976759536625]
We propose Interp3D, a training-free framework for textured 3D morphing.<n>It harnesses generative priors and adopts a progressive alignment principle to ensure both geometric fidelity and texture coherence.<n>For comprehensive evaluations, we construct a dedicated dataset, Interp3DData, with graded difficulty levels and assess generation results from fidelity, transition smoothness, and plausibility.
arXiv Detail & Related papers (2026-01-20T16:03:22Z) - GeoVideo: Introducing Geometric Regularization into Video Generation Model [46.38507581500745]
We introduce geometric regularization losses into video generation by augmenting latent diffusion models with per-frame depth prediction.<n>Our method bridges the gap between appearance generation and 3D structure modeling, leading to improved structural coherence-temporal shape, consistency, and physical plausibility.
arXiv Detail & Related papers (2025-12-03T05:11:57Z) - TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction [57.46712611558817]
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass.<n>Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry.<n>We propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies.
arXiv Detail & Related papers (2025-12-02T02:22:20Z) - VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment [48.147381011235446]
3D Gaussian Splatting has recently emerged as an efficient solution for real-time novel view synthesis.<n>We propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment.<n>Our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2025-10-13T14:44:50Z) - CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction [24.09888364478496]
We present CAGE, a robust framework for reconstructing vector floorplans directly from point-cloud density maps.<n>CAGE achieves state-of-the-art performance, with F1 scores of 99.1% (rooms), 91.7% (corners), and 89.3% (angles)
arXiv Detail & Related papers (2025-09-18T22:10:37Z) - Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization? [8.82283453148819]
Self-localization of building's floorplans has attracted researchers' interest.<n>Since floorplans are minimalist representations of a building's structure, modal and geometric differences between visual perceptions and floorplans pose challenges to this task.<n>Existing methods cleverly utilize 2D geometric features and pose filters to achieve promising performance.<n>This paper views the 2D Floorplan localization problem from a higher dimension by injecting 3D geometric priors into the visual FLoc algorithm.
arXiv Detail & Related papers (2025-07-25T01:34:26Z) - Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z) - FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [100.45129752375658]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.<n>Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.