Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training
- URL: http://arxiv.org/abs/2502.18219v1
- Date: Tue, 25 Feb 2025 14:04:22 GMT
- Title: Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training
- Authors: Botao Ye, Sifei Liu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang,
- Abstract summary: Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image.<n>These models often face challenges in maintaining consistency across novel and reference views.<n>We propose to use epipolar geometry to locate and retrieve overlapping information from the input view.<n>This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning.
- Score: 102.82553402539139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image. However, these models often face challenges in maintaining consistency across novel and reference views. A crucial factor leading to this issue is the limited utilization of contextual information from reference views. Specifically, when there is an overlap in the viewing frustum between two views, it is essential to ensure that the corresponding regions maintain consistency in both geometry and appearance. This observation leads to a simple yet effective approach, where we propose to use epipolar geometry to locate and retrieve overlapping information from the input view. This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning, as the process requires no learnable parameters. Furthermore, to enhance the overall consistency of generated views, we extend the utilization of epipolar attention to a multi-view setting, allowing retrieval of overlapping information from the input view and other target views. Qualitative and quantitative experimental results demonstrate the effectiveness of our method in significantly improving the consistency of synthesized views without the need for any fine-tuning. Moreover, This enhancement also boosts the performance of downstream applications such as 3D reconstruction. The code is available at https://github.com/botaoye/ConsisSyn.
Related papers
- Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting [4.89907242398523]
We propose renderability field-guided gaussian splatting (RF-GS) for scene view synthesis.
RF-GS quantifies input inhomogeneity through a renderability field, guiding pseudo-view sampling to enhanced visual consistency.
Our experiments on simulated and real-world data show that our method outperforms existing approaches in rendering stability.
arXiv Detail & Related papers (2025-04-27T14:41:01Z) - AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction [69.65671384868344]
We propose AR-1-to-3, a novel next-view prediction paradigm based on diffusion models.
We show that our method significantly improves the consistency between the generated views and the input views, producing high-fidelity 3D assets.
arXiv Detail & Related papers (2025-03-17T08:39:10Z) - Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image.<n>We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference.<n>This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z) - Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering [16.382098950820822]
We propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps.
We modify the self-attention mechanism to integrate information from the source view, reducing shape distortions.
Results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.
arXiv Detail & Related papers (2024-05-29T00:58:22Z) - CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency [18.101763989542828]
We propose a simple yet effective method that explicitly builds depth-aware consistency across input views.
Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views.
Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.
arXiv Detail & Related papers (2024-02-26T09:04:04Z) - Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition [49.26065739704278]
We propose a framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition.
An instance-view data hallucination module hallucinates each sample of a novel class to generate new data.
A prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class.
arXiv Detail & Related papers (2024-01-13T12:32:29Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - Consistent123: Improve Consistency for One Image to 3D Object Synthesis [74.1094516222327]
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability.
These models have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation.
We propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism.
arXiv Detail & Related papers (2023-10-12T07:38:28Z) - Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models [16.326276673056334]
Consistent-1-to-3 is a generative framework that significantly mitigates this issue.
We decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
We propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information.
arXiv Detail & Related papers (2023-10-04T17:58:57Z) - 3D Shape Reconstruction from Vision and Touch [62.59044232597045]
In 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored.
We introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects.
arXiv Detail & Related papers (2020-07-07T20:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.