DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
- URL: http://arxiv.org/abs/2510.11712v1
- Date: Mon, 13 Oct 2025 17:59:15 GMT
- Title: DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
- Authors: Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, Lu Qi,
- Abstract summary: DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
- Score: 76.82789568988557
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at https://github.com/Insta360-Research-Team/DiT360.
Related papers
- DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis [63.59932602411222]
DMAligner is a diffusion-based framework for image alignment through alignment-oriented view synthesis.<n>We propose a Dynamics-aware Diffusion Training approach for learning conditional image generation.<n>We develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment.
arXiv Detail & Related papers (2026-02-26T14:00:07Z) - World-Shaper: A Unified Framework for 360° Panoramic Editing [57.174341220144605]
Existing perspective-based image editing methods fail to model the spatial structure of panoramas.<n>We present World-Shaper, a unified geometry-aware framework that bridges panoramic generation and editing within a single editing-centric design.<n>Our method achieves superior geometric consistency, editing fidelity, and text controllability compared to SOTA methods.
arXiv Detail & Related papers (2026-01-30T19:38:54Z) - Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision [9.05196155518077]
This study presents a dual-stream angle-aware generation network that jointly estimates camera inclination angles and reconstructs upright panoramic images.<n> Experiments on the SUN360 and M3D datasets demonstrate that our method outperforms existing approaches in both inclination estimation and upright panorama generation.
arXiv Detail & Related papers (2025-11-30T14:28:21Z) - One Flight Over the Gap: A Survey from Perspective to Panoramic Vision [117.80970697177025]
This survey reviews recent panoramic vision techniques with a particular emphasis on the perspective-to-panorama adaptation.<n>We first revisit the panoramic imaging pipeline and projection methods to build the prior knowledge required for analyzing the structural disparities.<n>Building on this, we cover 20+ representative tasks drawn from more than 300 research papers in two dimensions.
arXiv Detail & Related papers (2025-09-04T17:59:10Z) - TanDiT: Tangent-Plane Diffusion Transformer for High-Quality 360° Panorama Generation [12.480249699450535]
TanDiT is a method that synthesizes panoramic scenes by generating grids of tangent-plane images covering the entire 360$circ$ view.<n>To accurately assess panoramic image quality, we also present two specialized metrics, TangentIS and TangentFID.
arXiv Detail & Related papers (2025-06-26T18:09:09Z) - Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images [52.48351378615057]
textitSplatter-360 is a novel end-to-end generalizable 3DGS framework to handle wide-baseline panoramic images.<n>We introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images.<n>This enables robust 3D-aware feature representations and real-time rendering capabilities.
arXiv Detail & Related papers (2024-12-09T06:58:31Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.<n>We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.<n>We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting [9.870063736691556]
We tackle the recently popular topic of generating 360-degree images given the conventional narrow field of view (NFoV) images.
This task aims to predict the reasonable and consistent surroundings from the NFoV images.
We propose a novel text-guided out-painting framework equipped with a State-Space Model called Mamba.
arXiv Detail & Related papers (2024-07-15T17:23:00Z) - High-Resolution Depth Estimation for 360-degree Panoramas through
Perspective and Panoramic Depth Images Registration [3.4583104874165804]
We propose a novel approach to compute high-resolution (2048x1024 and higher) depths for panoramas.
Our method generates qualitatively better results than existing panorama-based methods, and further outperforms them quantitatively on datasets unseen by these methods.
arXiv Detail & Related papers (2022-10-19T09:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.