Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting
- URL: http://arxiv.org/abs/2507.06971v2
- Date: Thu, 10 Jul 2025 01:50:07 GMT
- Title: Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting
- Authors: Fei Teng, Kai Luo, Sheng Wu, Siyu Li, Pujun Guo, Jiale Wei, Kunyu Peng, Jiaming Zhang, Kailun Yang,
- Abstract summary: We propose the first panoramic generation method Percep360 for autonomous driving.<n>Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data.<n>We evaluate the effectiveness of the generated images from three perspectives.
- Score: 20.14129939772052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360{\deg} surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot achieve high-quality, controllable panoramic generation. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/Bryant-Teng/Percep360.
Related papers
- 360Anything: Geometry-Free Lifting of Images and Videos to 360° [51.50120114305155]
Existing approaches rely on explicit geometric alignment between the perspective and the equirectangular projection space.<n>We propose 360Anything, a geometry-free framework built upon pre-trained diffusion transformers.<n>Our approach achieves state-of-the-art performance on both image and video perspective-to-360 generation.
arXiv Detail & Related papers (2026-01-22T18:45:59Z) - SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction [14.137976445056466]
SE360 is a novel framework for multi-condition guided object editing in 360$circ$ panoramas.<n>At its core is a novel coarse-to-fine autonomous data generation pipeline without manual intervention.<n>Our experiments demonstrate that our method outperforms existing methods in both visual quality and semantic accuracy.
arXiv Detail & Related papers (2025-12-23T00:24:46Z) - Dual-Projection Fusion for Accurate Upright Panorama Generation in Robotic Vision [9.05196155518077]
This study presents a dual-stream angle-aware generation network that jointly estimates camera inclination angles and reconstructs upright panoramic images.<n> Experiments on the SUN360 and M3D datasets demonstrate that our method outperforms existing approaches in both inclination estimation and upright panorama generation.
arXiv Detail & Related papers (2025-11-30T14:28:21Z) - DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training [76.82789568988557]
DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
arXiv Detail & Related papers (2025-10-13T17:59:15Z) - SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation [31.305851707485967]
We introduce SphereDiff, a novel approach for seamless 360-degree panoramic image and video generation using state-of-the-art diffusion models without additional tuning.<n>We extend MultiDiffusion to spherical latent space and propose a spherical latent sampling method to enable direct use of pretrained diffusion models.<n>Our method outperforms existing approaches in generating 360-degree panoramic content while maintaining high fidelity, making it a robust solution for immersive AR/VR applications.
arXiv Detail & Related papers (2025-04-19T19:59:11Z) - DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models [55.080748327139176]
We introduce PerLDiff, a novel method for effective street view image generation that fully leverages perspective 3D geometric information.<n>PerLDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process.<n> Empirical results justify that our PerLDiff markedly enhances the precision of controllable generation on the NuScenes and KITTI datasets.
arXiv Detail & Related papers (2024-07-08T16:46:47Z) - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes [72.02827211293736]
MagicDrive3D is a novel framework for controllable 3D street scene generation.<n>It supports multi-condition control, including road maps, 3D objects, and text descriptions.<n>It generates diverse, high-quality 3D driving scenes, supports any-view rendering, and enhances downstream tasks like BEV segmentation.
arXiv Detail & Related papers (2024-05-23T12:04:51Z) - 360-Degree Panorama Generation from Few Unregistered NFoV Images [16.05306624008911]
360$circ$ panoramas are extensively utilized as environmental light sources in computer graphics.
capturing a 360$circ$ $times$ 180$circ$ panorama poses challenges due to specialized and costly equipment.
We propose a novel pipeline called PanoDiff, which efficiently generates complete 360$circ$ panoramas.
arXiv Detail & Related papers (2023-08-28T16:21:51Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Capturing Omni-Range Context for Omnidirectional Segmentation [29.738065412097598]
We introduce Concurrent Attention Networks (ECANets) to bridge the gap in terms of FoV and structural distribution between the imaging domains.
We upgrade model training by leveraging multi-source and omni-supervised learning, taking advantage of both: Densely labeled and unlabeled data.
Our novel model, training regimen and multisource prediction fusion elevate the performance (mIoU) to new state-of-the-art results.
arXiv Detail & Related papers (2021-03-09T19:46:09Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - A Fixation-based 360{\deg} Benchmark Dataset for Salient Object
Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications.
salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.