A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision
- URL: http://arxiv.org/abs/2502.10444v1
- Date: Tue, 11 Feb 2025 08:05:11 GMT
- Title: A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision
- Authors: Hao Ai, Zidong Cao, Lin Wang,
- Abstract summary: In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular.
The advance of deep learning (DL) has significantly sparked its research and applications.
This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision.
- Score: 5.208806195877025
- License:
- Abstract: Omnidirectional image (ODI) data is captured with a field-of-view of 360x180, which is much wider than the pinhole cameras and captures richer surrounding environment details than the conventional perspective images. In recent years, the availability of customer-level 360 cameras has made omnidirectional vision more popular, and the advance of deep learning (DL) has significantly sparked its research and applications. This paper presents a systematic and comprehensive review and analysis of the recent progress of DL for omnidirectional vision. It delineates the distinct challenges and complexities encountered in applying DL to omnidirectional images as opposed to traditional perspective imagery. Our work covers four main contents: (i) A thorough introduction to the principles of omnidirectional imaging and commonly explored projections of ODI; (ii) A methodical review of varied representation learning approaches tailored for ODI; (iii) An in-depth investigation of optimization strategies specific to omnidirectional vision; (iv) A structural and hierarchical taxonomy of the DL methods for the representative omnidirectional vision tasks, from visual enhancement (e.g., image generation and super-resolution) to 3D geometry and motion estimation (e.g., depth and optical flow estimation), alongside the discussions on emergent research directions; (v) An overview of cutting-edge applications (e.g., autonomous driving and virtual reality), coupled with a critical discussion on prevailing challenges and open questions, to trigger more research in the community.
Related papers
- IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.
Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.
We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving [44.91443640710085]
VisionPAD is a novel self-supervised pre-training paradigm for vision-centric algorithms in autonomous driving.
It reconstructs multi-view representations using only images as supervision.
It significantly improves performance in 3D object detection, occupancy prediction and map segmentation.
arXiv Detail & Related papers (2024-11-22T03:59:41Z) - Discrete Latent Perspective Learning for Segmentation and Detection [40.9258359611346]
We propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning.
DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks.
arXiv Detail & Related papers (2024-06-15T02:40:49Z) - 3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey [1.3654846342364308]
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies.
We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats.
We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data.
arXiv Detail & Related papers (2024-01-17T14:57:27Z) - 3D Concept Learning and Reasoning from Multi-View Images [96.3088005719963]
We introduce a new large-scale benchmark for 3D multi-view visual question answering (3DMV-VQA)
This dataset consists of approximately 5k scenes, 600k images, paired with 50k questions.
We propose a novel 3D concept learning and reasoning framework that seamlessly combines neural fields, 2D pre-trained vision-language models, and neural reasoning operators.
arXiv Detail & Related papers (2023-03-20T17:59:49Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - Deep Learning for Omnidirectional Vision: A Survey and New Perspectives [7.068031114801553]
This paper presents a systematic and comprehensive review and analysis of the recent progress in deep learning methods for omnidirectional vision.
Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; and (iii) A summarization of the latest novel learning strategies and applications.
arXiv Detail & Related papers (2022-05-21T00:19:56Z) - 3D Object Detection from Images for Autonomous Driving: A Survey [68.33502122185813]
3D object detection from images is one of the fundamental and challenging problems in autonomous driving.
More than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications.
We provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection.
arXiv Detail & Related papers (2022-02-07T07:12:24Z) - Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep
Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem.
We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.
We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z) - 3D Human Shape and Pose from a Single Low-Resolution Image with
Self-Supervised Learning [105.49950571267715]
Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images.
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner.
arXiv Detail & Related papers (2020-07-27T16:19:52Z) - Seeing Around Corners with Edge-Resolved Transient Imaging [15.44831979669091]
Non-line-of-sight (NLOS) imaging seeks to form images of objects outside the field of view.
diffuse reflections scatter light in all directions, resulting in weak signals and a loss of directional information.
We propose a method for seeing around corners that derives angular resolution from vertical edges and longitudinal resolution from the temporal response to a pulsed light source.
arXiv Detail & Related papers (2020-02-17T18:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.