Related papers: 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries

URL: http://arxiv.org/abs/2311.17389v3
Date: Fri, 31 May 2024 14:38:08 GMT
Title: 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Authors: Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung,
Abstract summary: This paper introduces a new benchmark dataset, 360Loc, composed of 360$circ$ images with ground truth poses for visual localization. We propose a virtual camera approach to generate lower-FoV query frames from 360$circ$ images. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures.
Score: 24.345954562946385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Portable 360$^\circ$ cameras are becoming a cheap and efficient tool to establish large visual databases. By capturing omnidirectional views of a scene, these cameras could expedite building environment models that are essential for visual localization. However, such an advantage is often overlooked due to the lack of valuable datasets. This paper introduces a new benchmark dataset, 360Loc, composed of 360$^\circ$ images with ground truth poses for visual localization. We present a practical implementation of 360$^\circ$ mapping combining 360$^\circ$ images with lidar data to generate the ground truth 6DoF poses. 360Loc is the first dataset and benchmark that explores the challenge of cross-device visual positioning, involving 360$^\circ$ reference frames, and query frames from pinhole, ultra-wide FoV fisheye, and 360$^\circ$ cameras. We propose a virtual camera approach to generate lower-FoV query frames from 360$^\circ$ images, which ensures a fair comparison of performance among different query types in visual localization tasks. We also extend this virtual camera approach to feature matching-based and pose regression-based methods to alleviate the performance loss caused by the cross-device domain gap, and evaluate its effectiveness against state-of-the-art baselines. We demonstrate that omnidirectional visual localization is more robust in challenging large-scale scenes with symmetries and repetitive structures. These results provide new insights into 360-camera mapping and omnidirectional visual localization with cross-device queries.

Related papers

IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras [53.53895891356167]
We present a novel 3D reconstruction pipeline for 360$circ$ cameras for 3D mapping and rendering of indoor environments. Our approach (IM360) leverages the wide field of view of omnidirectional images and integrates the spherical camera model into every core component of the SfM pipeline. We evaluate our pipeline on large-scale indoor scenes from the Matterport3D and Stanford2D3D datasets.
arXiv Detail & Related papers (2025-02-18T05:15:19Z)
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z)
Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images [52.48351378615057]
textitSplatter-360 is a novel end-to-end generalizable 3DGS framework to handle wide-baseline panoramic images. We introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images. This enables robust 3D-aware feature representations and real-time rendering capabilities.
arXiv Detail & Related papers (2024-12-09T06:58:31Z)
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z)
See360: Novel Panoramic View Interpolation [24.965259708297932]
See360 is a versatile and efficient framework for 360 panoramic view using latent space viewpoint estimation. We show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for four datasets.
arXiv Detail & Related papers (2024-01-07T09:17:32Z)
NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z)
360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking [10.87309734945868]
360deg images can provide an omnidirectional field of view which is important for stable and long-term scene perception. In this paper, we explore 360deg images for visual object tracking and perceive new challenges caused by large distortion. We propose a new large-scale omnidirectional tracking benchmark dataset, 360VOT, in order to facilitate future research.
arXiv Detail & Related papers (2023-07-27T05:32:01Z)
Lazy Visual Localization via Motion Averaging [89.8709956317671]
We show that it is possible to achieve high localization accuracy without reconstructing the scene from the database. Experiments show that our visual localization proposal, LazyLoc, achieves comparable performance against state-of-the-art structure-based methods.
arXiv Detail & Related papers (2023-07-19T13:40:45Z)
Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos [48.54829780502176]
We present a new framework named Panoramic Vision Transformer (PAVER) We design the encoder using Vision Transformer with deformable convolution, which enables us to plug pretrained models from normal videos into our architecture without additional modules or finetuning. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision.
arXiv Detail & Related papers (2022-09-19T12:23:34Z)
Field-of-View IoU for Object Detection in 360{\deg} Images [36.72543749626039]
We propose two fundamental techniques -- Field-of-View IoU (FoV-IoU) and 360Augmentation for object detection in 360deg images. FoV-IoU computes the intersection-over-union of two Field-of-View bounding boxes in a spherical image which could be used for training, inference, and evaluation. 360Augmentation is a data augmentation technique specific to 360deg object detection task which randomly rotates a spherical image and solves the bias due to the sphere-to-plane projection.
arXiv Detail & Related papers (2022-02-07T14:01:59Z)
ScanGAN360: A Generative Model of Realistic Scanpaths for 360$^{\circ}$ Images [92.8211658773467]
We present ScanGAN360, a new generative adversarial approach to generate scanpaths for 360$circ$ images. We accomplish this by leveraging the use of a spherical adaptation of dynamic-time warping as a loss function. The quality of our scanpaths outperforms competing approaches by a large margin and is almost on par with the human baseline.
arXiv Detail & Related papers (2021-03-25T15:34:18Z)
A Fixation-based 360{\deg} Benchmark Dataset for Salient Object Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)
Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images. We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.