Related papers: One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

One Flight Over the Gap: A Survey from Perspective to Panoramic Vision

URL: http://arxiv.org/abs/2509.04444v2
Date: Tue, 09 Sep 2025 15:29:50 GMT
Title: One Flight Over the Gap: A Survey from Perspective to Panoramic Vision
Authors: Xin Lin, Xian Ge, Dizhe Zhang, Zhaoliang Wan, Xianshun Wang, Xiangtai Li, Wenjie Jiang, Bo Du, Dacheng Tao, Ming-Hsuan Yang, Lu Qi,
Abstract summary: This survey reviews recent panoramic vision techniques with a particular emphasis on the perspective-to-panorama adaptation.<n>We first revisit the panoramic imaging pipeline and projection methods to build the prior knowledge required for analyzing the structural disparities.<n>Building on this, we cover 20+ representative tasks drawn from more than 300 research papers in two dimensions.
Score: 117.80970697177025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driven by the demand for spatial intelligence and holistic scene perception, omnidirectional images (ODIs), which provide a complete 360\textdegree{} field of view, are receiving growing attention across diverse applications such as virtual reality, autonomous driving, and embodied robotics. Despite their unique characteristics, ODIs exhibit remarkable differences from perspective images in geometric projection, spatial distribution, and boundary continuity, making it challenging for direct domain adaption from perspective methods. This survey reviews recent panoramic vision techniques with a particular emphasis on the perspective-to-panorama adaptation. We first revisit the panoramic imaging pipeline and projection methods to build the prior knowledge required for analyzing the structural disparities. Then, we summarize three challenges of domain adaptation: severe geometric distortions near the poles, non-uniform sampling in Equirectangular Projection (ERP), and periodic boundary continuity. Building on this, we cover 20+ representative tasks drawn from more than 300 research papers in two dimensions. On one hand, we present a cross-method analysis of representative strategies for addressing panoramic specific challenges across different tasks. On the other hand, we conduct a cross-task comparison and classify panoramic vision into four major categories: visual quality enhancement and assessment, visual understanding, multimodal understanding, and visual generation. In addition, we discuss open challenges and future directions in data, models, and applications that will drive the advancement of panoramic vision research. We hope that our work can provide new insight and forward looking perspectives to advance the development of panoramic vision technologies. Our project page is https://insta360-research-team.github.io/Survey-of-Panorama

Related papers

JoPano: Unified Panorama Generation via Joint Modeling [51.392082596383034]
We propose a joint-face panorama (JoPano) generation approach that unifies the two core tasks within a DiT-based model.<n>We show that JoPano can generate high-quality panoramas for both text-to-panorama and view-to-panorama generation tasks.
arXiv Detail & Related papers (2025-12-07T15:19:26Z)
DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training [76.82789568988557]
DiT360 is a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation.<n>Our method achieves better boundary consistency and image fidelity across eleven quantitative metrics.
arXiv Detail & Related papers (2025-10-13T17:59:15Z)
PanoLora: Bridging Perspective and Panoramic Video Generation with LoRA Adaptation [17.498427118787045]
Panoramic videos rely on a single viewpoint with a limited field of view, making it difficult for standard video generation models to adapt.<n>Existing solutions often introduce complex architectures or large-scale training, leading to inefficiency and suboptimal results.<n>We propose treating panoramic video generation as an adaptation problem from perspective views.<n>Our approach efficiently fine-tunes a pretrained video diffusion model using only approximately 1,000 videos while achieving high-quality panoramic generation.
arXiv Detail & Related papers (2025-09-14T05:05:27Z)
ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models [52.87334248847314]
We propose a novel framework utilizing pretrained perspective video models for generating panoramic videos.<n>Specifically, we design a novel panorama representation named ViewPoint map, which possesses global spatial continuity and fine-grained visual details simultaneously.<n>Our method can synthesize highly dynamic and spatially consistent panoramic videos, achieving state-of-the-art performance and surpassing previous methods.
arXiv Detail & Related papers (2025-06-30T04:33:34Z)
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models [0.0]
We investigate the ability of Vision Language Models to perform visual perspective taking using a novel set of visual tasks inspired by established human tests.<n>Our approach leverages carefully controlled scenes, in which a single humanoid minifigure is paired with a single object.<n>Our analysis suggests a gap between surface-level object recognition and the deeper spatial and perspective reasoning required for complex visual tasks.
arXiv Detail & Related papers (2025-05-03T00:10:41Z)
A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision [5.208806195877025]
In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular.<n>The advance of deep learning (DL) has significantly sparked its research and applications.<n>This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision.
arXiv Detail & Related papers (2025-02-11T08:05:11Z)
A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective [71.03621840455754]
Graph Neural Networks (GNNs) have gained momentum in graph representation learning. graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation. This paper presents a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective.
arXiv Detail & Related papers (2022-09-27T08:10:14Z)
Deep Learning for Omnidirectional Vision: A Survey and New Perspectives [7.068031114801553]
This paper presents a systematic and comprehensive review and analysis of the recent progress in deep learning methods for omnidirectional vision. Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; and (iii) A summarization of the latest novel learning strategies and applications.
arXiv Detail & Related papers (2022-05-21T00:19:56Z)
Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image. Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos [49.217528156417906]
Two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama. We first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality.
arXiv Detail & Related papers (2020-05-21T10:03:40Z)
An Exploration of Embodied Visual Exploration [97.21890864063872]
Embodied computer vision considers perception for robots in novel, unstructured environments. We present a taxonomy for existing visual exploration algorithms and create a standard framework for benchmarking them. We then perform a thorough empirical study of the four state-of-the-art paradigms using the proposed framework.
arXiv Detail & Related papers (2020-01-07T17:40:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.