PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
- URL: http://arxiv.org/abs/2408.02157v1
- Date: Sun, 4 Aug 2024 22:23:10 GMT
- Title: PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
- Authors: Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer,
- Abstract summary: PanoFree is a novel method for tuning-free multi-view image generation.
It addresses the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning.
It demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning.
- Score: 37.45462643757252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning. It improves error accumulation by enhancing cross-view awareness and refines the warping and inpainting processes via cross-view guidance, risky area estimation and erasing, and symmetric bidirectional guided generation for loop closure, alongside guidance-based semantic and density control for scene structure preservation. In experiments on Planar, 360{\deg}, and Full Spherical Panoramas, PanoFree demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning. Compared to existing methods, PanoFree is up to 5x more efficient in time and 3x more efficient in GPU memory usage, and maintains superior diversity of results (2x better in our user study). PanoFree offers a viable alternative to costly fine-tuning or the use of additional pre-trained models. Project website at https://panofree.github.io/.
Related papers
- ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.
There exists a trade-off between reconstruction and generation quality regarding token length.
We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks.
Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data.
In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z) - 360-Degree Panorama Generation from Few Unregistered NFoV Images [16.05306624008911]
360$circ$ panoramas are extensively utilized as environmental light sources in computer graphics.
capturing a 360$circ$ $times$ 180$circ$ panorama poses challenges due to specialized and costly equipment.
We propose a novel pipeline called PanoDiff, which efficiently generates complete 360$circ$ panoramas.
arXiv Detail & Related papers (2023-08-28T16:21:51Z) - FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
Attention [37.58569261714206]
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images.
FastComposer enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.
arXiv Detail & Related papers (2023-05-17T17:59:55Z) - Parallax-Tolerant Unsupervised Deep Image Stitching [57.76737888499145]
We propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique.
First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion.
To further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks.
arXiv Detail & Related papers (2023-02-16T10:40:55Z) - MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation [34.61940502872307]
MultiDiffusion is a unified framework that enables versatile and controllable image generation.
We show that MultiDiffusion can be readily applied to generate high quality and diverse images.
arXiv Detail & Related papers (2023-02-16T06:28:29Z) - Unsupervised Cycle-consistent Generative Adversarial Networks for
Pan-sharpening [41.68141846006704]
We propose an unsupervised generative adversarial framework that learns from the full-scale images without the ground truths to alleviate this problem.
We extract the modality-specific features from the PAN and MS images with a two-stream generator, perform fusion in the feature domain, and then reconstruct the pan-sharpened images.
Results demonstrate that the proposed method can greatly improve the pan-sharpening performance on the full-scale images.
arXiv Detail & Related papers (2021-09-20T09:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.