T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph
- URL: http://arxiv.org/abs/2505.01207v1
- Date: Fri, 02 May 2025 11:50:48 GMT
- Title: T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph
- Authors: Qingyu Xian, Weiqin Jiao, Hao Cheng, Berend Jan van der Zwaag, Yanqiu Huang,
- Abstract summary: T-Graph is a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings.<n>It takes paired image features as input and maps them through a Multilayer Perceptron (MLP)<n>It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships.
- Score: 3.3301244688278078
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sparse-view camera pose estimation, which aims to estimate the 6-Degree-of-Freedom (6-DoF) poses from a limited number of images captured from different viewpoints, is a fundamental yet challenging problem in remote sensing applications. Existing methods often overlook the translation information between each pair of viewpoints, leading to suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into existing models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module's robustness. Extensive experiments on two state-of-the-art methods (RelPose++ and Forge) using public datasets (C03D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves by 1% to 6% from 2 to 8 viewpoints.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation [30.710296843150832]
Estimating relative camera poses between images has been a central problem in computer vision.
We show how to combine the best of both methods; our approach yields results that are both precise and robust.
A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators.
arXiv Detail & Related papers (2024-03-05T18:59:51Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D
Signals [9.201550006194994]
Learnable matchers often underperform when there exists only small regions of co-visibility between image pairs.
We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks.
We show that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs.
arXiv Detail & Related papers (2023-03-22T17:46:27Z) - Leveraging Image Matching Toward End-to-End Relative Camera Pose Regression [13.233301155616616]
This paper proposes a generalizable, end-to-end deep learning-based method for relative pose regression between two images.
Inspired by the classical pipeline, our method leverages Image Matching (IM) as a pre-trained task for relative pose regression.
We evaluate our method on several datasets and show that it outperforms previous end-to-end methods.
arXiv Detail & Related papers (2022-11-27T22:01:47Z) - E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs [61.552125054227595]
A new minimal solution is proposed to solve relative rotation estimation between two images without overlapping areas.
Based on E-Graph, the rotation estimation problem becomes simpler and more elegant.
We embed our rotation estimation strategy into a complete camera tracking and mapping system which obtains 6-DoF camera poses and a dense 3D mesh model.
arXiv Detail & Related papers (2022-07-20T16:11:48Z) - PoserNet: Refining Relative Camera Poses Exploiting Object Detections [14.611595909419297]
We use objectness regions to guide the pose estimation problem rather than explicit semantic object detections.
We propose Pose Refiner Network (PoserNet) a light-weight Graph Network to refine the approximate pair-wise relative camera poses.
We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms.
arXiv Detail & Related papers (2022-07-19T17:58:33Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - GeoGraph: Learning graph-based multi-view object detection with
geometric cues end-to-end [10.349116753411742]
We propose an end-to-end learnable approach that detects static urban objects from multiple views.
Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions.
Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views.
arXiv Detail & Related papers (2020-03-23T09:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.