Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation
- URL: http://arxiv.org/abs/2505.12130v1
- Date: Sat, 17 May 2025 20:05:34 GMT
- Title: Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation
- Authors: Niaz Ahmad, Jawad Khan, Kang G. Shin, Youngmoon Lee, Guanghui Wang,
- Abstract summary: Keypoints as Dynamic Centroid (KDC) is a new centroid-based representation for unified human pose estimation and instance-level segmentation.<n>KDC adopts a bottom-up paradigm to generate keypoint heatmaps for both easily distinguishable and complex keypoints.<n>It exploits high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids.
- Score: 19.109607441709418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dynamic movement of the human body presents a fundamental challenge for human pose estimation and body segmentation. State-of-the-art approaches primarily rely on combining keypoint heatmaps with segmentation masks but often struggle in scenarios involving overlapping joints or rapidly changing poses during instance-level segmentation. To address these limitations, we propose Keypoints as Dynamic Centroid (KDC), a new centroid-based representation for unified human pose estimation and instance-level segmentation. KDC adopts a bottom-up paradigm to generate keypoint heatmaps for both easily distinguishable and complex keypoints and improves keypoint detection and confidence scores by introducing KeyCentroids using a keypoint disk. It leverages high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids, allowing for swift clustering of pixels to specific human instances during rapid body movements in live environments. Our experimental evaluations on the CrowdPose, OCHuman, and COCO benchmarks demonstrate KDC's effectiveness and generalizability in challenging scenarios in terms of both accuracy and runtime performance. The implementation is available at: https://sites.google.com/view/niazahmad/projects/kdc.
Related papers
- You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping [119.41166438439313]
YOEO is a single-stage method that outputs instance segmentation and NPCS representations in an end-to-end manner.<n>We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid.<n>We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz.
arXiv Detail & Related papers (2025-06-06T03:49:20Z) - VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation [8.486534291290559]
We introduce VISUALCENT, a unified human pose and instance segmentation framework to address generalizability and scalability limitations to multi person visual human analysis.<n>For the unified segmentation task, an explicit keypoint is defined as a dynamic centroid called MaskCentroid to swiftly cluster pixels to specific human instance during rapid changes in human body movement or significantly occluded environment.<n> Experimental results on COCO and OCHuman datasets demonstrate VISUALCENTs accuracy and real time performance advantages, outperforming existing methods in mAP scores and execution frame rate per second.
arXiv Detail & Related papers (2025-04-26T21:58:56Z) - Robust Human Registration with Body Part Segmentation on Noisy Point Clouds [73.00876572870787]
We introduce a hybrid approach that incorporates body-part segmentation into the mesh fitting process.<n>Our method first assigns body part labels to individual points, which then guide a two-step SMPL-X fitting.<n>We demonstrate that the fitted human mesh can refine body part labels, leading to improved segmentation.
arXiv Detail & Related papers (2025-04-04T17:17:33Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [27.849059115252008]
We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints.
Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
arXiv Detail & Related papers (2023-02-17T08:29:33Z) - Spatiotemporal k-means [39.98633724527769]
We propose a twotemporal clustering method called k-means (STk) that is able to analyze multi-scale clusters.
We show how STkM can be extended to more complex machine learning tasks, particularly unsupervised region of interest detection and tracking in videos.
arXiv Detail & Related papers (2022-11-10T04:40:31Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - The Center of Attention: Center-Keypoint Grouping via Attention for
Multi-Person Pose Estimation [0.0]
CenterGroup is an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image.
Our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up methods.
arXiv Detail & Related papers (2021-10-11T10:22:04Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.