Related papers: TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation

URL: http://arxiv.org/abs/2506.21681v1
Date: Thu, 26 Jun 2025 18:09:09 GMT
Title: TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation
Authors: Hakan Çapuk, Andrew Bond, Muhammed Burak Kızıl, Emir Göçen, Erkut Erdem, Aykut Erdem,
Abstract summary: TanDiT is a method that synthesizes panoramic scenes by generating grids of tangent-plane images covering the entire 360$circ$ view.<n>To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID.
Score: 12.480249699450535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in image generation have led to remarkable improvements in synthesizing perspective images. However, these models still struggle with panoramic image generation due to unique challenges, including varying levels of geometric distortion and the requirement for seamless loop-consistency. To address these issues while leveraging the strengths of the existing models, we introduce TanDiT, a method that synthesizes panoramic scenes by generating grids of tangent-plane images covering the entire 360$^\circ$ view. Unlike previous methods relying on multiple diffusion branches, TanDiT utilizes a unified diffusion model trained to produce these tangent-plane images simultaneously within a single denoising iteration. Furthermore, we propose a model-agnostic post-processing step specifically designed to enhance global coherence across the generated panoramas. To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID, and provide a comprehensive benchmark comprising captioned panoramic datasets and standardized evaluation scripts. Extensive experiments demonstrate that our method generalizes effectively beyond its training data, robustly interprets detailed and complex text prompts, and seamlessly integrates with various generative models to yield high-quality, diverse panoramic images.

Related papers

Conditional Panoramic Image Generation via Masked Autoregressive Modeling [35.624070746282186]
We propose a unified framework, Panoramic AutoRegressive model (PAR), which leverages masked autoregressive modeling to address these challenges.<n>To address the inherent discontinuity in existing generative models, we introduce circular padding to enhance spatial coherence.<n>Experiments demonstrate competitive performance in text-to-image generation and panorama outpainting tasks.
arXiv Detail & Related papers (2025-05-22T16:20:12Z)
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.<n>We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.<n>Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z)
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation [12.588962705218103]
This paper introduces the Multi-Scale Diffusion (MSD), an optimized framework that extends the panoramic image generation framework to multiple resolution levels.<n>Our method leverages gradient descent techniques to incorporate structural information from low-resolution images into high-resolution outputs.
arXiv Detail & Related papers (2024-10-24T15:18:51Z)
MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image. Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z)
DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts. We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z)
Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets. We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts. Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z)
HORIZON: High-Resolution Semantically Controlled Panorama Synthesis [105.55531244750019]
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, but a direct application of these methods to panorama synthesis yields distorted content. We unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling.
arXiv Detail & Related papers (2022-10-10T09:43:26Z)
Cross-View Panorama Image Synthesis [68.35351563852335]
PanoGAN is a novel adversarial feedback GAN framework named. PanoGAN enables high-quality panorama image generation with more convincing details than state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-22T15:59:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.