The Center of Attention: Center-Keypoint Grouping via Attention for
Multi-Person Pose Estimation
- URL: http://arxiv.org/abs/2110.05132v1
- Date: Mon, 11 Oct 2021 10:22:04 GMT
- Title: The Center of Attention: Center-Keypoint Grouping via Attention for
Multi-Person Pose Estimation
- Authors: Guillem Bras\'o, Nikita Kister, Laura Leal-Taix\'e
- Abstract summary: CenterGroup is an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image.
Our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce CenterGroup, an attention-based framework to estimate human
poses from a set of identity-agnostic keypoints and person center predictions
in an image. Our approach uses a transformer to obtain context-aware embeddings
for all detected keypoints and centers and then applies multi-head attention to
directly group joints into their corresponding person centers. While most
bottom-up methods rely on non-learnable clustering at inference, CenterGroup
uses a fully differentiable attention mechanism that we train end-to-end
together with our keypoint detector. As a result, our method obtains
state-of-the-art performance with up to 2.5x faster inference time than
competing bottom-up methods. Our code is available at
https://github.com/dvl-tum/center-group .
Related papers
- Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency [57.961869351897384]
We propose a framework based on cross-modal semantic consistency for efficient image clustering.<n>Our framework first builds a strong foundation via Cross-Modal Semantic Consistency.<n>In the first stage, we train lightweight clustering heads to align with the rich semantics of the pre-trained model.<n>In the second stage, we introduce a Self-Enhanced fine-tuning strategy.
arXiv Detail & Related papers (2025-08-02T08:12:57Z) - Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation [19.109607441709418]
Keypoints as Dynamic Centroid (KDC) is a new centroid-based representation for unified human pose estimation and instance-level segmentation.<n>KDC adopts a bottom-up paradigm to generate keypoint heatmaps for both easily distinguishable and complex keypoints.<n>It exploits high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids.
arXiv Detail & Related papers (2025-05-17T20:05:34Z) - How to optimize K-means? [8.206124331448931]
Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets.
We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster.
We propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms.
arXiv Detail & Related papers (2025-03-25T03:37:52Z) - Dense Center-Direction Regression for Object Counting and Localization with Point Supervision [1.9526430269580954]
We propose a novel approach termed CeDiRNet for point-supervised learning.
It uses a dense regression of directions pointing towards the nearest object centers.
We show that it outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-26T17:49:27Z) - Improved Face Representation via Joint Label Classification and
Supervised Contrastive Clustering [5.874142059884521]
Face clustering tasks can learn hierarchical semantic information from large-scale data.
This paper proposes a joint optimization task of label classification and supervised contrastive clustering to introduce the cluster knowledge to the traditional face recognition task.
arXiv Detail & Related papers (2023-12-07T03:55:20Z) - CenterNet++ for Object Detection [174.59360147041673]
Bottom-up approaches are as competitive as the top-down and enjoy higher recall.
Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint)
On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieves APs of 53.7% and 57.1%, respectively.
arXiv Detail & Related papers (2022-04-18T16:45:53Z) - Instance-weighted Central Similarity for Multi-label Image Retrieval [66.23348499938278]
We propose Instance-weighted Central Similarity (ICS) to automatically learn the center weight corresponding to a hash code.
Our method achieves the state-of-the-art performance on the image retrieval benchmarks, and especially improves the mAP by 1.6%-6.4% on the MS COCO dataset.
arXiv Detail & Related papers (2021-08-11T15:18:18Z) - Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z) - Attention-Based Clustering: Learning a Kernel from Context [0.0]
We propose Attention-Based Clustering (ABC), a neural architecture based on the attention mechanism.
ABC is designed to learn latent representations that adapt to context within an input set.
We present competitive results for clustering Omniglot characters and include analytical evidence of the effectiveness of an attention-based approach for clustering.
arXiv Detail & Related papers (2020-10-02T15:06:06Z) - A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z) - Differentiable Hierarchical Graph Grouping for Multi-Person Pose
Estimation [95.72606536493548]
Multi-person pose estimation is challenging because it localizes body keypoints for multiple persons simultaneously.
We propose a novel differentiable Hierarchical Graph Grouping (HGG) method to learn the graph grouping in bottom-up multi-person pose estimation task.
arXiv Detail & Related papers (2020-07-23T08:46:22Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.