Related papers: Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

URL: http://arxiv.org/abs/2205.10468v2
Date: Tue, 24 May 2022 08:49:08 GMT
Title: Deep Learning for Omnidirectional Vision: A Survey and New Perspectives
Authors: Hao Ai, Zidong Cao, Jinjing Zhu, Haotian Bai, Yucheng Chen and Lin Wang
Abstract summary: This paper presents a systematic and comprehensive review and analysis of the recent progress in deep learning methods for omnidirectional vision. Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; and (iii) A summarization of the latest novel learning strategies and applications.
Score: 7.068031114801553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Omnidirectional image (ODI) data is captured with a 360x180 field-of-view, which is much wider than the pinhole cameras and contains richer spatial information than the conventional planar images. Accordingly, omnidirectional vision has attracted booming attention due to its more advantageous performance in numerous applications, such as autonomous driving and virtual reality. In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress in DL methods for omnidirectional vision. Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; (iii) A summarization of the latest novel learning strategies and applications; (iv) An insightful discussion of the challenges and open problems by highlighting the potential research directions to trigger more research in the community.

Related papers

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z)
A Systematic Investigation on Deep Learning-Based Omnidirectional Image and Video Super-Resolution [30.62413133817583]
This paper presents a systematic review of recent progress in omnidirectional image and video super-resolution.<n>We introduce a new dataset, 360Insta, that comprises authentically degraded omnidirectional images and videos.<n>We conduct comprehensive qualitative and quantitative evaluations of existing methods on both public datasets and our proposed dataset.
arXiv Detail & Related papers (2025-06-07T08:24:44Z)
A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision [5.208806195877025]
In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular. The advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision.
arXiv Detail & Related papers (2025-02-11T08:05:11Z)
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z)
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving [44.91443640710085]
VisionPAD is a novel self-supervised pre-training paradigm for vision-centric algorithms in autonomous driving. It reconstructs multi-view representations using only images as supervision. It significantly improves performance in 3D object detection, occupancy prediction and map segmentation.
arXiv Detail & Related papers (2024-11-22T03:59:41Z)
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image [70.02187124865627]
Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene. We leverage a vision foundation model to provide image-wise guidance for discovering novel classes in 3D scenes. We demonstrate significant improvements in accuracy and generalization, highlighting the potential of foundation models in advancing open-vocabulary 3D object detection.
arXiv Detail & Related papers (2024-07-07T04:50:04Z)
Technique Report of CVPR 2024 PBDL Challenges [211.79824163599872]
Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. Deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop.
arXiv Detail & Related papers (2024-06-15T21:44:17Z)
Vision-based Learning for Drones: A Survey [1.280979348722635]
Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning. This review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities. We explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios.
arXiv Detail & Related papers (2023-12-08T12:57:13Z)
Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously. Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z)
Surround-View Vision-based 3D Detection for Autonomous Driving: A Survey [0.6091702876917281]
We provide a literature survey for the existing Vision Based 3D detection methods, focused on autonomous driving. We have highlighted how the literature and industry trend have moved towards surround-view image based methods and note down thoughts on what special cases this method addresses.
arXiv Detail & Related papers (2023-02-13T19:30:17Z)
3D Object Detection from Images for Autonomous Driving: A Survey [68.33502122185813]
3D object detection from images is one of the fundamental and challenging problems in autonomous driving. More than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. We provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection.
arXiv Detail & Related papers (2022-02-07T07:12:24Z)
Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview [8.442460766094674]
Object pose detection and tracking has attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Deep learning is the most promising one that has shown better performance than others. This paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route.
arXiv Detail & Related papers (2021-05-29T12:59:29Z)
Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks. We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z)
Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data. We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.