Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection
- URL: http://arxiv.org/abs/2003.10238v1
- Date: Fri, 20 Mar 2020 08:33:25 GMT
- Title: Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection
- Authors: Xixia Xu, Qi Zou, Xue Lin
- Abstract summary: We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
- Score: 33.15192824888279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel Enhanced Feature Aggregation and Selection network
(EFASNet) for multi-person 2D human pose estimation. Due to enhanced feature
representation, our method can well handle crowded, cluttered and occluded
scenes. More specifically, a Feature Aggregation and Selection Module (FASM),
which constructs hierarchical multi-scale feature aggregation and makes the
aggregated features discriminative, is proposed to get more accurate
fine-grained representation, leading to more precise joint locations. Then, we
perform a simple Feature Fusion (FF) strategy which effectively fuses
high-resolution spatial features and low-resolution semantic features to obtain
more reliable context information for well-estimated joints. Finally, we build
a Dense Upsampling Convolution (DUC) module to generate more precise
prediction, which can recover missing joint details that are usually
unavailable in common upsampling process. As a result, the predicted keypoint
heatmaps are more accurate. Comprehensive experiments demonstrate that the
proposed approach outperforms the state-of-the-art methods and achieves the
superior performance over three benchmark datasets: the recent big dataset
CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset.
Our code will be released upon acceptance.
Related papers
- CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition [10.045163723630159]
CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies.
Our approach seamlessly adapts to single-entity backbones and boosts their performance in multi-entity scenarios.
arXiv Detail & Related papers (2024-10-09T17:55:43Z) - GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet.
In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships.
Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - C$^{4}$Net: Contextual Compression and Complementary Combination Network
for Salient Object Detection [0.0]
We show that feature concatenation works better than other combination methods like multiplication or addition.
Also, joint feature learning gives better results, because of the information sharing during their processing.
arXiv Detail & Related papers (2021-10-22T16:14:10Z) - DexDeepFM: Ensemble Diversity Enhanced Extreme Deep Factorization
Machine Model [8.73107818888638]
An ensemble diversity enhanced extreme deep factorization machine model (DexDeepFM) is proposed.
Experiments on two public real-world datasets show the superiority of the proposed model.
arXiv Detail & Related papers (2021-04-05T14:06:32Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine
Reconstruction with Self-Projection Optimization [52.20602782690776]
It is expensive and tedious to obtain large scale paired sparse-canned point sets for training from real scanned sparse data.
We propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface.
We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performance to the state-of-the-art supervised methods.
arXiv Detail & Related papers (2020-12-08T14:14:09Z) - DHOG: Deep Hierarchical Object Grouping [0.0]
We show that greedy or local methods of maximising mutual information (such as gradient optimisation) discover local optima of the mutual information criterion.
We introduce deep hierarchical object grouping (DHOG) that computes a number distinct discrete representations of images in a hierarchical order.
We find that these representations align better with the downstream task of grouping into underlying object classes.
arXiv Detail & Related papers (2020-03-13T14:11:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.