Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation
- URL: http://arxiv.org/abs/2303.14888v1
- Date: Mon, 27 Mar 2023 02:54:08 GMT
- Title: Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation
- Authors: Ruoqi Yin, Jianqin Yin
- Abstract summary: We propose a convolutional neural network for bottom-up human pose estimation.
Our model has the ability to focus on different granularity from local to global regions.
Our results on the COCO and CrowdPose datasets demonstrate that it is an efficient framework for multi-person pose estimation.
- Score: 4.24515544235173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we concern on the bottom-up paradigm in multi-person pose
estimation (MPPE). Most previous bottom-up methods try to consider the relation
of instances to identify different body parts during the post processing, while
ignoring to model the relation among instances or environment in the feature
learning process. In addition, most existing works adopt the operations of
upsampling and downsampling. During the sampling process, there will be a
problem of misalignment with the source features, resulting in deviations in
the keypoint features learned by the model.
To overcome the above limitations, we propose a convolutional neural network
for bottom-up human pose estimation. It invovles two basic modules: (i) Global
Relation Modeling (GRM) module globally learns relation (e.g., environment
context, instance interactive information) among region of image by fusing
multiple stages features in the feature learning process. It combines with the
spatial-channel attention mechanism, which focuses on achieving adaptability in
spatial and channel dimensions. (ii) Multi-branch Feature Align (MFA) module
aggregates features from multiple branches to align fused feature and obtain
refined local keypoint representation. Our model has the ability to focus on
different granularity from local to global regions, which significantly boosts
the performance of the multi-person pose estimation. Our results on the COCO
and CrowdPose datasets demonstrate that it is an efficient framework for
multi-person pose estimation.
Related papers
- GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting [4.117416395116726]
We propose a simple but efficient Global Multi-geometric Feature Learning Network (GMFL-Net)
Specifically, we design a MIA-Module that aims to improve information representation by fusing multi-geometric features.
We also design a GBFL-Module that enhances the inter-dependencies between point-wise and channel-wise elements.
arXiv Detail & Related papers (2024-08-31T02:18:26Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose
Estimation [16.32910684198013]
We present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem.
We show two unique characteristics from DiffPose on pose estimation task: (i) the ability to combine multiple sets of pose estimates to improve prediction accuracy, particularly for challenging joints, and (ii) the ability to adjust the number of iterative steps for feature refinement without retraining the model.
arXiv Detail & Related papers (2023-07-31T14:00:23Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.