Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation
- URL: http://arxiv.org/abs/2303.14888v1
- Date: Mon, 27 Mar 2023 02:54:08 GMT
- Title: Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation
- Authors: Ruoqi Yin, Jianqin Yin
- Abstract summary: We propose a convolutional neural network for bottom-up human pose estimation.
Our model has the ability to focus on different granularity from local to global regions.
Our results on the COCO and CrowdPose datasets demonstrate that it is an efficient framework for multi-person pose estimation.
- Score: 4.24515544235173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we concern on the bottom-up paradigm in multi-person pose
estimation (MPPE). Most previous bottom-up methods try to consider the relation
of instances to identify different body parts during the post processing, while
ignoring to model the relation among instances or environment in the feature
learning process. In addition, most existing works adopt the operations of
upsampling and downsampling. During the sampling process, there will be a
problem of misalignment with the source features, resulting in deviations in
the keypoint features learned by the model.
To overcome the above limitations, we propose a convolutional neural network
for bottom-up human pose estimation. It invovles two basic modules: (i) Global
Relation Modeling (GRM) module globally learns relation (e.g., environment
context, instance interactive information) among region of image by fusing
multiple stages features in the feature learning process. It combines with the
spatial-channel attention mechanism, which focuses on achieving adaptability in
spatial and channel dimensions. (ii) Multi-branch Feature Align (MFA) module
aggregates features from multiple branches to align fused feature and obtain
refined local keypoint representation. Our model has the ability to focus on
different granularity from local to global regions, which significantly boosts
the performance of the multi-person pose estimation. Our results on the COCO
and CrowdPose datasets demonstrate that it is an efficient framework for
multi-person pose estimation.
Related papers
- Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.
It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.
We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z) - USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation [24.90512145836643]
We introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation.
We show that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2024-12-12T12:20:27Z) - GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting [4.117416395116726]
We propose a simple but efficient Global Multi-geometric Feature Learning Network (GMFL-Net)
Specifically, we design a MIA-Module that aims to improve information representation by fusing multi-geometric features.
We also design a GBFL-Module that enhances the inter-dependencies between point-wise and channel-wise elements.
arXiv Detail & Related papers (2024-08-31T02:18:26Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose
Estimation [16.32910684198013]
We present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem.
We show two unique characteristics from DiffPose on pose estimation task: (i) the ability to combine multiple sets of pose estimates to improve prediction accuracy, particularly for challenging joints, and (ii) the ability to adjust the number of iterative steps for feature refinement without retraining the model.
arXiv Detail & Related papers (2023-07-31T14:00:23Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.