Related papers: From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

URL: http://arxiv.org/abs/2602.20630v3
Date: Tue, 03 Mar 2026 05:53:21 GMT
Title: From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection
Authors: Yepeng Liu, Hao Li, Liwen Yang, Fangzhen Li, Xudi Ge, Yuliang Gu, kuang Gao, Bing Wang, Guang Chen, Hangjun Ye, Yongchao Xu,
Abstract summary: Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM.<n>We introduce TraqPoint, a novel, end-to-end Reinforcement Learning framework designed to optimize keypoints directly on image sequences.<n>Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views.
Score: 23.384541298514574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Keypoint-based matching is a fundamental component of modern 3D vision systems, such as Structure-from-Motion (SfM) and SLAM. Most existing learning-based methods are trained on image pairs, a paradigm that fails to explicitly optimize for the long-term trackability of keypoints across sequences under challenging viewpoint and illumination changes. In this paper, we reframe keypoint detection as a sequential decision-making problem. We introduce TraqPoint, a novel, end-to-end Reinforcement Learning (RL) framework designed to optimize the \textbf{Tra}ck-\textbf{q}uality (Traq) of keypoints directly on image sequences. Our core innovation is a track-aware reward mechanism that jointly encourages the consistency and distinctiveness of keypoints across multiple views, guided by a policy gradient method. Extensive evaluations on sparse matching benchmarks, including relative pose estimation and 3D reconstruction, demonstrate that TraqPoint significantly outperforms some state-of-the-art (SOTA) keypoint detection and description methods.

Related papers

FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM [50.9765003472032]
FoundationSLAM is a learning-based monocular dense SLAM system for accurate and robust tracking and mapping.<n>Our core idea is to bridge flow estimation with reasoning by leveraging the guidance from foundation depth models.
arXiv Detail & Related papers (2025-12-31T17:57:45Z)
GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring [9.322937309882022]
Keypoints come with a score permitting to rank them according to their quality. While learned keypoints often exhibit better properties than handcrafted ones, their scores are not easily interpretable. We propose a framework that can refine, and at the same time characterize with an interpretable score, the keypoints extracted by any method.
arXiv Detail & Related papers (2024-08-30T09:39:59Z)
D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer [14.056531181678467]
We introduce a saliency-guided transtextbfformer, referred to as textitD3Former, which entails the joint learning of repeatable textbfDetectors and feature-enhanced textbfDescriptors. Our proposed method consistently outperforms state-of-the-art point cloud matching methods.
arXiv Detail & Related papers (2023-12-20T12:19:17Z)
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features [64.39691149255717]
Keypoint detection on 3D shapes requires semantic and geometric awareness while demanding high localization accuracy. We employ a keypoint candidate optimization module which aims to match the average observed distribution of keypoints on the shape. The resulting approach achieves a new state of the art for few-shot keypoint detection on the KeyPointNet dataset.
arXiv Detail & Related papers (2023-11-29T21:58:41Z)
Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM) This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z)
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration [28.96448680048584]
KeyPoint Positioning System (KeyPosS) is first framework to deduce exact landmark coordinates by triangulating distances between points of interest and anchor points predicted by a fully convolutional network. Experiments on four datasets demonstrate state-of-the-art performance, with KeyPosS outperforming existing methods in low-resolution settings despite minimal computational overhead.
arXiv Detail & Related papers (2023-05-25T19:30:21Z)
From Keypoints to Object Landmarks via Self-Training Correspondence: A novel approach to Unsupervised Landmark Discovery [37.78933209094847]
This paper proposes a novel paradigm for the unsupervised learning of object landmark detectors. We validate our method on a variety of difficult datasets, including LS3D, BBCPose, Human3.6M and PennAction.
arXiv Detail & Related papers (2022-05-31T15:44:29Z)
Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints. We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map. Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z)
SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA) Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z)
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry. We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)
Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information. To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy. We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.