PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
- URL: http://arxiv.org/abs/2408.02157v1
- Date: Sun, 4 Aug 2024 22:23:10 GMT
- Title: PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
- Authors: Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer,
- Abstract summary: PanoFree is a novel method for tuning-free multi-view image generation.
It addresses the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning.
It demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning.
- Score: 37.45462643757252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning. It improves error accumulation by enhancing cross-view awareness and refines the warping and inpainting processes via cross-view guidance, risky area estimation and erasing, and symmetric bidirectional guided generation for loop closure, alongside guidance-based semantic and density control for scene structure preservation. In experiments on Planar, 360{\deg}, and Full Spherical Panoramas, PanoFree demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning. Compared to existing methods, PanoFree is up to 5x more efficient in time and 3x more efficient in GPU memory usage, and maintains superior diversity of results (2x better in our user study). PanoFree offers a viable alternative to costly fine-tuning or the use of additional pre-trained models. Project website at https://panofree.github.io/.
Related papers
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.
We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z) - T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation [56.054622766743414]
We propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation.
It enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances.
Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models.
arXiv Detail & Related papers (2024-12-18T04:01:32Z) - PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs [10.970010947605289]
We introduce PanoLlama, a novel framework that redefines panoramic image generation as a next-token prediction task.
Building on the pre-trained LlamaGen architecture, we generate images in an autoregressive manner and develop an expansion strategy to handle size limitations.
This method aligns with the image token structure in a crop-wise and training-free manner, resulting in high-quality panoramas with minimal seams and maximum scalability.
arXiv Detail & Related papers (2024-11-24T15:06:57Z) - ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.
There exists a trade-off between reconstruction and generation quality regarding token length.
We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
Attention [37.58569261714206]
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images.
FastComposer enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.
arXiv Detail & Related papers (2023-05-17T17:59:55Z) - MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation [34.61940502872307]
MultiDiffusion is a unified framework that enables versatile and controllable image generation.
We show that MultiDiffusion can be readily applied to generate high quality and diverse images.
arXiv Detail & Related papers (2023-02-16T06:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.