Related papers: PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance

URL: http://arxiv.org/abs/2408.02157v1
Date: Sun, 4 Aug 2024 22:23:10 GMT
Title: PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Authors: Aoming Liu, Zhong Li, Zhang Chen, Nannan Li, Yi Xu, Bryan A. Plummer,
Abstract summary: PanoFree is a novel method for tuning-free multi-view image generation. It addresses the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning. It demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning.
Score: 37.45462643757252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Immersive scene generation, notably panorama creation, benefits significantly from the adaptation of large pre-trained text-to-image (T2I) models for multi-view image generation. Due to the high cost of acquiring multi-view images, tuning-free generation is preferred. However, existing methods are either limited to simple correspondences or require extensive fine-tuning to capture complex ones. We present PanoFree, a novel method for tuning-free multi-view image generation that supports an extensive array of correspondences. PanoFree sequentially generates multi-view images using iterative warping and inpainting, addressing the key issues of inconsistency and artifacts from error accumulation without the need for fine-tuning. It improves error accumulation by enhancing cross-view awareness and refines the warping and inpainting processes via cross-view guidance, risky area estimation and erasing, and symmetric bidirectional guided generation for loop closure, alongside guidance-based semantic and density control for scene structure preservation. In experiments on Planar, 360{\deg}, and Full Spherical Panoramas, PanoFree demonstrates significant error reduction, improves global consistency, and boosts image quality without extra fine-tuning. Compared to existing methods, PanoFree is up to 5x more efficient in time and 3x more efficient in GPU memory usage, and maintains superior diversity of results (2x better in our user study). PanoFree offers a viable alternative to costly fine-tuning or the use of additional pre-trained models. Project website at https://panofree.github.io/.

Related papers

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images. We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z)
T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Generation [56.054622766743414]
We propose a Training-free Triplet Tuning for Sketch-to-Scene (T3-S2S) generation. It enhances keyword representation via the prompt balance module, reducing the risk of missing critical instances. Experiments validate that our triplet tuning approach substantially improves the performance of existing sketch-to-image models.
arXiv Detail & Related papers (2024-12-18T04:01:32Z)
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs [10.970010947605289]
We introduce PanoLlama, a novel framework that redefines panoramic image generation as a next-token prediction task. Building on the pre-trained LlamaGen architecture, we generate images in an autoregressive manner and develop an expansion strategy to handle size limitations. This method aligns with the image token structure in a crop-wise and training-free manner, resulting in high-quality panoramas with minimal seams and maximum scalability.
arXiv Detail & Related papers (2024-11-24T15:06:57Z)
ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality. There exists a trade-off between reconstruction and generation quality regarding token length. We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z)
Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z)
Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views. We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images. We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z)
Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z)
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention [37.58569261714206]
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. FastComposer enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.
arXiv Detail & Related papers (2023-05-17T17:59:55Z)
Parallax-Tolerant Unsupervised Deep Image Stitching [57.76737888499145]
We propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique. First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion. To further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks.
arXiv Detail & Related papers (2023-02-16T10:40:55Z)
MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation [34.61940502872307]
MultiDiffusion is a unified framework that enables versatile and controllable image generation. We show that MultiDiffusion can be readily applied to generate high quality and diverse images.
arXiv Detail & Related papers (2023-02-16T06:28:29Z)
Unsupervised Cycle-consistent Generative Adversarial Networks for Pan-sharpening [41.68141846006704]
We propose an unsupervised generative adversarial framework that learns from the full-scale images without the ground truths to alleviate this problem. We extract the modality-specific features from the PAN and MS images with a two-stream generator, perform fusion in the feature domain, and then reconstruct the pan-sharpened images. Results demonstrate that the proposed method can greatly improve the pan-sharpening performance on the full-scale images.
arXiv Detail & Related papers (2021-09-20T09:43:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.