Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation
- URL: http://arxiv.org/abs/2304.13201v1
- Date: Wed, 26 Apr 2023 00:04:50 GMT
- Title: Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation
- Authors: Negar Nejatishahidin, Will Hutchcroft, Manjunath Narayana, Ivaylo
Boyadzhiev, Yuguang Li, Naji Khosravan, Jana Kosecka, Sing Bing Kang
- Abstract summary: Graph-CoVis is a novel Graph Neural Network based architecture that jointly learns the co-visible structure and global motion.
We show that our model performs competitively to state-of-the-art approaches.
- Score: 11.8322612639007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we address the problem of wide-baseline camera pose estimation
from a group of 360$^\circ$ panoramas under upright-camera assumption. Recent
work has demonstrated the merit of deep-learning for end-to-end direct relative
pose regression in 360$^\circ$ panorama pairs [11]. To exploit the benefits of
multi-view logic in a learning-based framework, we introduce Graph-CoVis, which
non-trivially extends CoVisPose [11] from relative two-view to global
multi-view spherical camera pose estimation. Graph-CoVis is a novel Graph
Neural Network based architecture that jointly learns the co-visible structure
and global motion in an end-to-end and fully-supervised approach. Using the
ZInD [4] dataset, which features real homes presenting wide-baselines,
occlusion, and limited visual overlap, we show that our model performs
competitively to state-of-the-art approaches.
Related papers
- Cross-View World Models [3.7896239978609434]
We introduce Cross-View World Models (XVWM), trained with a cross-view prediction objective.<n>We train on synchronized multi-view gameplay data from Aimlabs.<n>Our results show that multi-view consistency provides a strong learning signal for spatially grounded representations.
arXiv Detail & Related papers (2026-02-07T00:02:15Z) - Omnidirectional Spatial Modeling from Correlated Panoramas [4.75637997496421]
Existing omnidirectional methods achieve scene understanding within a single frame while neglecting cross-frame correlated panoramas.<n>We introduce textbfCFpano, the textbffirst benchmark dataset dedicated to cross-frame correlated panoramas visual question answering.<n>We present methodname, a multi-modal large language model (MLLM) fine-tuned with Group Relative Policy Optimization ( GRPO) and a set of tailored reward functions for robust and consistent reasoning with cross-frame correlated panoramas.
arXiv Detail & Related papers (2025-09-02T10:14:55Z) - Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression [23.65253469577653]
We introduce Alligat0R, a novel pre-training approach that reformulates cross-view learning as a co-visibility segmentation task.
Our method predicts whether each pixel in one image is co-visible in the second image, occluded, or outside the field of view (FOV)
To support this, we present Cub3, a large-scale dataset with 2.5 million image pairs and dense co-visibility annotations.
arXiv Detail & Related papers (2025-03-10T17:29:48Z) - 3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey [1.3654846342364308]
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies.
We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats.
We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data.
arXiv Detail & Related papers (2024-01-17T14:57:27Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline
Panoramas [54.4948540627471]
We propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas.
Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion.
Results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods.
arXiv Detail & Related papers (2023-06-02T13:35:07Z) - E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs [61.552125054227595]
A new minimal solution is proposed to solve relative rotation estimation between two images without overlapping areas.
Based on E-Graph, the rotation estimation problem becomes simpler and more elegant.
We embed our rotation estimation strategy into a complete camera tracking and mapping system which obtains 6-DoF camera poses and a dense 3D mesh model.
arXiv Detail & Related papers (2022-07-20T16:11:48Z) - Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image [10.842428621768667]
This paper aims to estimate head pose from a single image by applying notions of network curvature.
In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach.
arXiv Detail & Related papers (2022-04-27T15:20:26Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - Perspective Plane Program Induction from a Single Image [85.28956922100305]
We study the inverse graphics problem of inferring a holistic representation for natural images.
We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image.
Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem.
arXiv Detail & Related papers (2020-06-25T21:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.