Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
- URL: http://arxiv.org/abs/2503.21140v1
- Date: Thu, 27 Mar 2025 04:09:13 GMT
- Title: Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
- Authors: Junjie Chen, Weilong Chen, Yifan Zuo, Yuming Fang,
- Abstract summary: Category-agnostic pose estimation aims to locate keypoints on query images according to a few annotated support images for arbitrary novel classes.<n>We propose a novel yet concise framework, which recurrently mines FGSA features from both support and query images.
- Score: 33.204232825380394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Category-agnostic pose estimation aims to locate keypoints on query images according to a few annotated support images for arbitrary novel classes. Existing methods generally extract support features via heatmap pooling, and obtain interacted features from support and query via cross-attention. Hence, these works neglect to mine fine-grained and structure-aware (FGSA) features from both support and query images, which are crucial for pixel-level keypoint localization. To this end, we propose a novel yet concise framework, which recurrently mines FGSA features from both support and query images. Specifically, we design a FGSA mining module based on deformable attention mechanism. On the one hand, we mine fine-grained features by applying deformable attention head over multi-scale feature maps. On the other hand, we mine structure-aware features by offsetting the reference points of keypoints to their linked keypoints. By means of above module, we recurrently mine FGSA features from support and query images, and thus obtain better support features and query estimations. In addition, we propose to use mixup keypoints to pad various classes to a unified keypoint number, which could provide richer supervision than the zero padding used in existing works. We conduct extensive experiments and in-depth studies on large-scale MP-100 dataset, and outperform SOTA method dramatically (+3.2\%PCK@0.05). Code is avaiable at https://github.com/chenbys/FMMP.
Related papers
- Hybrid Mamba for Few-Shot Segmentation [54.562050590453225]
Few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity.
We aim to devise a cross (attention-like) Mamba to capture inter-sequence dependencies for FSS.
A simple idea is to scan on support features to selectively compress them into the hidden state, which is then used as the initial hidden state to sequentially scan query features.
arXiv Detail & Related papers (2024-09-29T08:51:14Z) - SCAPE: A Simple and Strong Category-Agnostic Pose Estimator [6.705257644513057]
Category-Agnostic Pose Estimation (CAPE) aims to localize keypoints on an object of any category given few exemplars in an in-context manner.
We introduce two key modules: a global keypoint feature perceptor to inject global semantic information into support keypoints, and a keypoint attention refiner to enhance inter-node correlation between keypoints.
SCAPE outperforms prior arts by 2.2 and 1.3 PCK under 1-shot and 5-shot settings with faster inference speed and lighter model capacity.
arXiv Detail & Related papers (2024-07-18T13:02:57Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.<n>We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.<n>We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Meta-Point Learning and Refining for Category-Agnostic Pose Estimation [46.98479393474727]
Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary classes given a few support images annotated with keypoints.
We propose a novel framework for CAPE based on such potential keypoints (named as meta-points)
arXiv Detail & Related papers (2024-03-20T14:54:33Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - Self-Calibrated Cross Attention Network for Few-Shot Segmentation [65.20559109791756]
We design a self-calibrated cross attention (SCCA) block for efficient patch-based attention.
SCCA groups the patches from the same query image and the aligned patches from the support image as K&V.
In this way, the query BG features are fused with matched BG features in support FG, and thus the aforementioned issues will be mitigated.
arXiv Detail & Related papers (2023-08-18T04:41:50Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Few-Shot Segmentation via Cycle-Consistent Transformer [74.49307213431952]
We focus on utilizing pixel-wise relationships between support and target images to facilitate the few-shot semantic segmentation task.
We propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features.
Our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-06-04T07:57:48Z) - SimPropNet: Improved Similarity Propagation for Few-shot Image
Segmentation [14.419517737536706]
Recent deep neural network based FSS methods leverage high-dimensional feature similarity between the foreground features of the support images and the query image features.
We propose to jointly predict the support and query masks to force the support features to share characteristics with the query features.
Our method achieves state-of-the-art results for one-shot and five-shot segmentation on the PASCAL-5i dataset.
arXiv Detail & Related papers (2020-04-30T17:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.