Pose for Everything: Towards Category-Agnostic Pose Estimation
- URL: http://arxiv.org/abs/2207.10387v1
- Date: Thu, 21 Jul 2022 09:40:54 GMT
- Title: Pose for Everything: Towards Category-Agnostic Pose Estimation
- Authors: Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang,
Ping Luo, Xiaogang Wang
- Abstract summary: Category-Agnostic Pose Estimation (CAPE) aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images.
We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms.
- Score: 93.07415325374761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing works on 2D pose estimation mainly focus on a certain category, e.g.
human, animal, and vehicle. However, there are lots of application scenarios
that require detecting the poses/keypoints of the unseen class of objects. In
this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE),
which aims to create a pose estimation model capable of detecting the pose of
any class of object given only a few samples with keypoint definition. To
achieve this goal, we formulate the pose estimation problem as a keypoint
matching problem and design a novel CAPE framework, termed POse Matching
Network (POMNet). A transformer-based Keypoint Interaction Module (KIM) is
proposed to capture both the interactions among different keypoints and the
relationship between the support and query images. We also introduce
Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object
categories containing over 20K instances and is well-designed for developing
CAPE algorithms. Experiments show that our method outperforms other baseline
approaches by a large margin. Codes and data are available at
https://github.com/luminxu/Pose-for-Everything.
Related papers
- CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation [3.5379836919221566]
Estimating rigid objects' poses is one of the fundamental problems in computer vision.
This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation.
arXiv Detail & Related papers (2024-10-11T17:26:27Z) - SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - A Graph-Based Approach for Category-Agnostic Pose Estimation [12.308036453869033]
Category-agnostic pose estimation (CAPE) was introduced to enable keypoint localization for arbitrary object categories.
We present a significant departure from conventional CAPE techniques, which treat keypoints as isolated entities, by treating the input pose data as a graph.
Our solution boosts performance by 0.98% under a 1-shot setting, achieving a new state-of-the-art for CAPE.
arXiv Detail & Related papers (2023-11-29T18:44:12Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - 6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world.
We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z) - Pose-based Modular Network for Human-Object Interaction Detection [5.6397911482914385]
We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection.
To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks.
arXiv Detail & Related papers (2020-08-05T10:56:09Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z) - Self-supervised Keypoint Correspondences for Multi-Person Pose
Estimation and Tracking in Videos [32.43899916477434]
We propose an approach that relies on keypoint correspondences for associating persons in videos.
Instead of training the network for estimating keypoint correspondences on video data, it is trained on a large scale image datasets for human pose estimation.
Our approach achieves state-of-the-art results for multi-frame pose estimation and multi-person pose tracking on the PosTrack $2017$ and PoseTrack $2018$ data sets.
arXiv Detail & Related papers (2020-04-27T09:02:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.