Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
- URL: http://arxiv.org/abs/2310.05056v3
- Date: Mon, 11 Dec 2023 11:08:16 GMT
- Title: Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching
- Authors: Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo,
Yu Qiao, Kaipeng Zhang
- Abstract summary: Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
- Score: 77.97246496316515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current image-based keypoint detection methods for animal (including human)
bodies and faces are generally divided into full-supervised and few-shot
class-agnostic approaches. The former typically relies on laborious and
time-consuming manual annotations, posing considerable challenges in expanding
keypoint detection to a broader range of keypoint categories and animal
species. The latter, though less dependent on extensive manual input, still
requires necessary support images with annotation for reference during testing.
To realize zero-shot keypoint detection without any prior annotation, we
introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is
innovatively designed to use text prompts for identifying arbitrary keypoints
across any species. In pursuit of this goal, we have developed a novel
framework named Open-Vocabulary Keypoint Detection with Semantic-feature
Matching (KDSM). This framework synergistically combines vision and language
models, creating an interplay between language features and local keypoint
visual features. KDSM enhances its capabilities by integrating Domain
Distribution Matrix Matching (DDMM) and other special modules, such as the
Vision-Keypoint Relational Awareness (VKRA) module, improving the framework's
generalizability and overall performance.Our comprehensive experiments
demonstrate that KDSM significantly outperforms the baseline in terms of
performance and achieves remarkable success in the OVKD task.Impressively, our
method, operating in a zero-shot fashion, still yields results comparable to
state-of-the-art few-shot species class-agnostic keypoint detection methods.We
will make the source code publicly accessible.
Related papers
- SCAPE: A Simple and Strong Category-Agnostic Pose Estimator [6.705257644513057]
Category-Agnostic Pose Estimation (CAPE) aims to localize keypoints on an object of any category given few exemplars in an in-context manner.
We introduce two key modules: a global keypoint feature perceptor to inject global semantic information into support keypoints, and a keypoint attention refiner to enhance inter-node correlation between keypoints.
SCAPE outperforms prior arts by 2.2 and 1.3 PCK under 1-shot and 5-shot settings with faster inference speed and lighter model capacity.
arXiv Detail & Related papers (2024-07-18T13:02:57Z) - Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation [3.976851945232775]
Current approaches for sign language recognition rely on RGB video inputs, which are vulnerable to fluctuations in the background.
We propose a multi-stream keypoint attention network to depict a sequence of keypoints produced by a readily available keypoint estimator.
We carry out comprehensive experiments on well-known benchmarks like Phoenix-2014, Phoenix-2014T, and CSL-Daily to showcase the efficacy of our methodology.
arXiv Detail & Related papers (2024-05-09T10:58:37Z) - Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection [9.788417605537965]
We introduce a novel end-to-end open vocabulary HOI detection framework with conditional multi-level decoding and fine-grained semantic enhancement.
Our proposed method achieves state-of-the-art results in open vocabulary HOI detection.
arXiv Detail & Related papers (2024-04-09T10:27:22Z) - KOPPA: Improving Prompt-based Continual Learning with Key-Query
Orthogonal Projection and Prototype-based One-Versus-All [26.506535205897443]
We introduce a novel key-query learning strategy to enhance prompt matching efficiency and address the challenge of shifting features.
Our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.
arXiv Detail & Related papers (2023-11-26T20:35:19Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Few-shot Keypoint Detection with Uncertainty Learning for Unseen Species [28.307200505494126]
We propose a versatile Few-shot Keypoint Detection (FSKD) pipeline, which can detect a varying number of keypoints of different kinds.
Our FSKD involves main and auxiliary keypoint representation learning, similarity learning, and keypoint localization.
We show the effectiveness of our FSKD on (i) novel keypoint detection for unseen species, and (ii) few-shot Fine-Grained Visual Recognition (FGVR) and (iii) Semantic Alignment (SA) downstream tasks.
arXiv Detail & Related papers (2021-12-12T08:39:47Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.