CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
- URL: http://arxiv.org/abs/2406.00384v1
- Date: Sat, 1 Jun 2024 09:50:13 GMT
- Title: CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
- Authors: Matan Rusanovsky, Or Hirschorn, Shai Avidan,
- Abstract summary: Category-agnostic pose estimation (CAPE) aims to facilitate keypoint localization for diverse object categories.
Our work departs from conventional CAPE methods, by adopting a text-based approach instead of the support image.
We validate our novel approach using the MP-100 benchmark, a comprehensive dataset spanning over 100 categories and 18,000 images.
- Score: 10.951186766576173
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional 2D pose estimation models are constrained by their design to specific object categories. This limits their applicability to predefined objects. To overcome these limitations, category-agnostic pose estimation (CAPE) emerged as a solution. CAPE aims to facilitate keypoint localization for diverse object categories using a unified model, which can generalize from minimal annotated support images. Recent CAPE works have produced object poses based on arbitrary keypoint definitions annotated on a user-provided support image. Our work departs from conventional CAPE methods, which require a support image, by adopting a text-based approach instead of the support image. Specifically, we use a pose-graph, where nodes represent keypoints that are described with text. This representation takes advantage of the abstraction of text descriptions and the structure imposed by the graph. Our approach effectively breaks symmetry, preserves structure, and improves occlusion handling. We validate our novel approach using the MP-100 benchmark, a comprehensive dataset spanning over 100 categories and 18,000 images. Under a 1-shot setting, our solution achieves a notable performance boost of 1.07\%, establishing a new state-of-the-art for CAPE. Additionally, we enrich the dataset by providing text description annotations, further enhancing its utility for future research.
Related papers
- Edge Weight Prediction For Category-Agnostic Pose Estimation [12.308036453869033]
Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model.
We introduce EdgeCape, a novel framework that overcomes limitations by predicting the graph's edge weights.
We show that this improves the model's ability to capture global spatial dependencies.
arXiv Detail & Related papers (2024-11-25T18:53:09Z) - A Graph-Based Approach for Category-Agnostic Pose Estimation [12.308036453869033]
Category-agnostic pose estimation (CAPE) was introduced to enable keypoint localization for arbitrary object categories.
We present a significant departure from conventional CAPE techniques, which treat keypoints as isolated entities, by treating the input pose data as a graph.
Our solution boosts performance by 0.98% under a 1-shot setting, achieving a new state-of-the-art for CAPE.
arXiv Detail & Related papers (2023-11-29T18:44:12Z) - Self-supervised Few-shot Learning for Semantic Segmentation: An
Annotation-free Approach [4.855689194518905]
Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis.
Existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images.
We propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images.
arXiv Detail & Related papers (2023-07-26T18:33:30Z) - A Low-Shot Object Counting Network With Iterative Prototype Adaptation [14.650207945870598]
We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot)
Existing methods extract queries by feature pooling which neglects the shape information (e.g., size and aspect) and leads to a reduced object localization accuracy and count estimates.
We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA)
arXiv Detail & Related papers (2022-11-15T15:39:23Z) - Pose for Everything: Towards Category-Agnostic Pose Estimation [93.07415325374761]
Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images.
We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
arXiv Detail & Related papers (2022-07-21T09:40:54Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.