UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features
- URL: http://arxiv.org/abs/2509.04932v1
- Date: Fri, 05 Sep 2025 08:54:57 GMT
- Title: UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features
- Authors: Haowang Cui, Rui Chen, Tao Luo, Rui Li, Jiaze Wang,
- Abstract summary: We propose a novel model dubbed as UniView, which can leverage reference images from a similar object to provide strong prior information during view synthesis.<n>Our UniView significantly improves novel view synthesis performance and outperforms state-of-the-art methods on the challenging datasets.
- Score: 8.962212671008201
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The task of synthesizing novel views from a single image is highly ill-posed due to multiple explanations for unobserved areas. Most current methods tend to generate unseen regions from ambiguity priors and interpolation near input views, which often lead to severe distortions. To address this limitation, we propose a novel model dubbed as UniView, which can leverage reference images from a similar object to provide strong prior information during view synthesis. More specifically, we construct a retrieval and augmentation system and employ a multimodal large language model (MLLM) to assist in selecting reference images that meet our requirements. Additionally, a plug-and-play adapter module with multi-level isolation layers is introduced to dynamically generate reference features for the target views. Moreover, in order to preserve the details of an original input image, we design a decoupled triple attention mechanism, which can effectively align and integrate multi-branch features into the synthesis process. Extensive experiments have demonstrated that our UniView significantly improves novel view synthesis performance and outperforms state-of-the-art methods on the challenging datasets.
Related papers
- FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation [49.74776147964999]
We present a lightweight adaptation technique for appearance transfer in multiview diffusion models.<n>Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output.
arXiv Detail & Related papers (2025-12-10T13:06:40Z) - Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training [102.82553402539139]
Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image.<n>These models often face challenges in maintaining consistency across novel and reference views.<n>We propose to use epipolar geometry to locate and retrieve overlapping information from the input view.<n>This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning.
arXiv Detail & Related papers (2025-02-25T14:04:22Z) - MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes [35.16430027877207]
MOVIS aims to enhance the structural awareness of the view-conditioned diffusion model for multi-object NVS.<n>We introduce an auxiliary task requiring the model to simultaneously predict novel view object masks.<n>Our method exhibits strong generalization capabilities and produces consistent novel view synthesis.
arXiv Detail & Related papers (2024-12-16T05:23:45Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images.
We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - UniMS: A Unified Framework for Multimodal Summarization with Knowledge
Distillation [43.15662489492694]
We propose a Unified framework for Multimodal Summarization grounding on BART, UniMS.
We adopt knowledge distillation from a vision-language pretrained model to improve image selection.
Our best model achieves a new state-of-the-art result on a large-scale benchmark dataset.
arXiv Detail & Related papers (2021-09-13T09:36:04Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.