Related papers: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis

Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis

URL: http://arxiv.org/abs/2501.02913v1
Date: Mon, 06 Jan 2025 10:48:31 GMT
Title: Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis
Authors: Thang-Anh-Quan Nguyen, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond,
Abstract summary: We present PointmapDiffusion, a novel framework for single-image novel view synthesis.<n>Our method is the first to leverage pointmaps as a conditioning signal, capturing prior to the reference images to guide the diffusion process.<n>Experiments on diverse real-world datasets demonstrate that PointmapDiffusion achieves high-quality, multi-view consistent results.
Score: 2.612019169899311
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this paper, we present PointmapDiffusion, a novel framework for single-image novel view synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our method is the first to leverage pointmaps (i.e. rasterized 3D scene coordinates) as a conditioning signal, capturing geometric prior from the reference images to guide the diffusion process. By embedding reference attention blocks and a ControlNet for pointmap features, our model balances between generative capability and geometric consistency, enabling accurate view synthesis across varying viewpoints. Extensive experiments on diverse real-world datasets demonstrate that PointmapDiffusion achieves high-quality, multi-view consistent results with significantly fewer trainable parameters compared to other baselines for single-image NVS tasks.

Related papers

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z)
CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes [5.719388462440881]
We present CE-NPBG, a new approach for novel view synthesis (NVS) in large-scale autonomous driving scenes. Our method is a neural point-based technique that leverages two modalities: posed images (cameras) and synchronized raw 3D point clouds (LiDAR) By leveraging this connectivity, our method significantly improves rendering quality and enhances run-time and scalability.
arXiv Detail & Related papers (2025-04-28T08:02:02Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z)
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z)
Explicit Correspondence Matching for Generalizable Neural Radiance Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views. Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z)
Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry. We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z)
Novel View Synthesis with Diffusion Models [56.55571338854636]
We present 3DiM, a diffusion model for 3D novel view synthesis. It is able to translate a single input view into consistent and sharp completions across many views. 3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
arXiv Detail & Related papers (2022-10-06T16:59:56Z)
Leveraging Monocular Disparity Estimation for Single-View Reconstruction [8.583436410810203]
We leverage advances in monocular depth estimation to obtain disparity maps. We transform 2D normalized disparity maps into 3D point clouds by solving an optimization on the relevant camera parameters.
arXiv Detail & Related papers (2022-07-01T03:05:40Z)
A Model for Multi-View Residual Covariances based on Perspective Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z)
Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations. We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.