Related papers: CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model

CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model

URL: http://arxiv.org/abs/2511.13121v1
Date: Mon, 17 Nov 2025 08:20:06 GMT
Title: CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model
Authors: Yuqi Zhang, Guanying Chen, Jiaxing Chen, Chuanyu Fu, Chuan Huang, Shuguang Cui,
Abstract summary: Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task.<n>Recent advances in video diffusion models have demonstrated strong temporal reasoning capabilities.<n>We present a diffusion-based framework, called CloseUpShot, for close-up novel view synthesis from sparse inputs via point-conditioned video diffusion.
Score: 50.93869080795228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task. Recent advances in video diffusion models have demonstrated strong temporal reasoning capabilities, making them a promising tool for enhancing reconstruction quality under sparse-view settings. However, existing approaches are primarily designed for modest viewpoint variations, which struggle in capturing fine-grained details in close-up scenarios since input information is severely limited. In this paper, we present a diffusion-based framework, called CloseUpShot, for close-up novel view synthesis from sparse inputs via point-conditioned video diffusion. Specifically, we observe that pixel-warping conditioning suffers from severe sparsity and background leakage in close-up settings. To address this, we propose hierarchical warping and occlusion-aware noise suppression, enhancing the quality and completeness of the conditioning images for the video diffusion model. Furthermore, we introduce global structure guidance, which leverages a dense fused point cloud to provide consistent geometric context to the diffusion process, to compensate for the lack of globally consistent 3D constraints in sparse conditioning inputs. Extensive experiments on multiple datasets demonstrate that our method outperforms existing approaches, especially in close-up novel view synthesis, clearly validating the effectiveness of our design.

Related papers

DT-NVS: Diffusion Transformers for Novel View Synthesis [22.458328201080715]
We propose a 3D-aware diffusion model for generalized novel view synthesis.<n>We make significant contributions to transformer and self-attention architectures to translate images to 3d representations.<n>We show improvements over state-of-the-art 3D aware diffusion models and deterministic approaches.
arXiv Detail & Related papers (2025-11-11T22:40:00Z)
MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis [34.793258395288895]
We present MV-Performer, an innovative framework for creating novel view videos from monocular full-body captures.<n>To achieve a 360-degree synthesis, we extensively leverage the MVHumanNet dataset and incorporate an informative condition signal.<n>To maintain synchronization in the generated videos, we propose a multi-view human-centric video diffusion model.
arXiv Detail & Related papers (2025-10-08T16:24:22Z)
WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image [3.4248731707266264]
This paper proposes a novel view-consistent image generation method which utilizes diffusion models without additional modules.<n>Our key idea is to enhance diffusion models with a training-free method that enables adaptive attention manipulation and noise reinitialization.<n>Our method improves view consistency across various diffusion models, demonstrating its broader applicability.
arXiv Detail & Related papers (2025-06-30T05:00:47Z)
Stable Virtual Camera: Generative View Synthesis with Diffusion Models [51.71244310522393]
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene.<n>Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy.<n>Our method can generate high-quality videos lasting up to half a minute with seamless loop closure.
arXiv Detail & Related papers (2025-03-18T17:57:22Z)
Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training [102.82553402539139]
Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image.<n>These models often face challenges in maintaining consistency across novel and reference views.<n>We propose to use epipolar geometry to locate and retrieve overlapping information from the input view.<n>This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning.
arXiv Detail & Related papers (2025-02-25T14:04:22Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD. It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments. In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z)
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising [48.02829400913904]
We introduce ViewFusion, a training-free algorithm that can be seamlessly integrated into existing pre-trained diffusion models. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation. Our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.
arXiv Detail & Related papers (2024-02-29T04:21:38Z)
ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis [47.0052408875896]
ViewFusion is an end-to-end generative approach to novel view synthesis with unparalleled flexibility.<n>Our method is tested on the relatively small Neural 3D Mesh Renderer dataset.
arXiv Detail & Related papers (2024-02-05T11:22:14Z)
UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z)
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.