Related papers: End2End Multi-View Feature Matching with Differentiable Pose Optimization

End2End Multi-View Feature Matching with Differentiable Pose Optimization

URL: http://arxiv.org/abs/2205.01694v3
Date: Mon, 11 Sep 2023 10:06:19 GMT
Title: End2End Multi-View Feature Matching with Differentiable Pose Optimization
Authors: Barbara Roessle and Matthias Nie{\ss}ner
Abstract summary: We propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. We integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once.
Score: 2.311583680973075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at once. Multi-view matching combined with end-to-end training improves the pose estimation metrics on Matterport3D by 18.5% compared to SuperGlue.

Related papers

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation [58.47973015036709]
3D pose estimation from sparse multi-views is a critical task for action recognition, sports analysis, and human-robot interaction.<n>We propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph problem.<n> COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods.
arXiv Detail & Related papers (2026-01-14T18:50:17Z)
End-to-End Multi-Person Pose Estimation with Pose-Aware Video Transformer [7.19764062839405]
We present a fully end-to-end framework for multi-person 2D pose estimation in videos.<n>A key challenge is to associate individuals across frames under complex and overlapping temporal trajectories.<n>We introduce a novel Pose-Aware VideoErEr Network (PAVE-Net), which features a spatial encoder to model intra-frame relations and atemporal decoder pose.
arXiv Detail & Related papers (2025-11-17T10:19:35Z)
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [57.13066710710485]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z)
T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph [3.3301244688278078]
T-Graph is a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings.<n>It takes paired image features as input and maps them through a Multilayer Perceptron (MLP)<n>It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships.
arXiv Detail & Related papers (2025-05-02T11:50:48Z)
One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
PRISM: PRogressive dependency maxImization for Scale-invariant image Matching [4.9521269535586185]
We propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM) Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks.
arXiv Detail & Related papers (2024-08-07T07:35:17Z)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses. We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections [19.211193336526346]
We propose a Pose-refined Rotation Averaging Graph Optimization (PRAGO) method for differentiable estimating camera poses from a set of images. Our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates.
arXiv Detail & Related papers (2024-03-13T14:42:55Z)
AffineGlue: Joint Matching and Robust Estimation [74.04609046690913]
We propose AffineGlue, a method for joint two-view feature matching and robust estimation. AffineGlue selects potential matches from one-to-many correspondences to estimate minimal models. Guided matching is then used to find matches consistent with the model, suffering less from the ambiguities of one-to-one matches.
arXiv Detail & Related papers (2023-07-28T08:05:36Z)
IMP: Iterative Matching and Pose Estimation with Adaptive Pooling [34.36397639248686]
We propose an textbfefficient IMP, called EIMP, to dynamically discard keypoints without potential matches. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
arXiv Detail & Related papers (2023-04-28T13:25:50Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame. Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information. Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z)
AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views. We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic. We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z)
Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos [32.43899916477434]
We propose an approach that relies on keypoint correspondences for associating persons in videos. Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation. Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack $2017$ and PoseTrack $2018$ data sets.
arXiv Detail & Related papers (2020-04-27T09:02:24Z)
Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario. We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.