Related papers: Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection

Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection

URL: http://arxiv.org/abs/2003.10238v1
Date: Fri, 20 Mar 2020 08:33:25 GMT
Title: Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection
Authors: Xixia Xu, Qi Zou, Xue Lin
Abstract summary: We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Our method can well handle crowded, cluttered and occluded scenes. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
Score: 33.15192824888279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Due to enhanced feature representation, our method can well handle crowded, cluttered and occluded scenes. More specifically, a Feature Aggregation and Selection Module (FASM), which constructs hierarchical multi-scale feature aggregation and makes the aggregated features discriminative, is proposed to get more accurate fine-grained representation, leading to more precise joint locations. Then, we perform a simple Feature Fusion (FF) strategy which effectively fuses high-resolution spatial features and low-resolution semantic features to obtain more reliable context information for well-estimated joints. Finally, we build a Dense Upsampling Convolution (DUC) module to generate more precise prediction, which can recover missing joint details that are usually unavailable in common upsampling process. As a result, the predicted keypoint heatmaps are more accurate. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves the superior performance over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset. Our code will be released upon acceptance.

Related papers

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition [10.045163723630159]
CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies. Our approach seamlessly adapts to single-entity backbones and boosts their performance in multi-entity scenarios.
arXiv Detail & Related papers (2024-10-09T17:55:43Z)
Enhanced Semantic Segmentation for Large-Scale and Imbalanced Point Clouds [6.253217784798542]
Small-sized objects are prone to be under-sampled or misclassified due to their low occurrence frequency. We propose the Multilateral Cascading Network (MCNet) for large-scale and sample-imbalanced point cloud scenes.
arXiv Detail & Related papers (2024-09-21T02:23:01Z)
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet. In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships. Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z)
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance. We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives. We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU. We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors. We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z)
C$^{4}$Net: Contextual Compression and Complementary Combination Network for Salient Object Detection [0.0]
We show that feature concatenation works better than other combination methods like multiplication or addition. Also, joint feature learning gives better results, because of the information sharing during their processing.
arXiv Detail & Related papers (2021-10-22T16:14:10Z)
DexDeepFM: Ensemble Diversity Enhanced Extreme Deep Factorization Machine Model [8.73107818888638]
An ensemble diversity enhanced extreme deep factorization machine model (DexDeepFM) is proposed. Experiments on two public real-world datasets show the superiority of the proposed model.
arXiv Detail & Related papers (2021-04-05T14:06:32Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction with Self-Projection Optimization [52.20602782690776]
It is expensive and tedious to obtain large scale paired sparse-canned point sets for training from real scanned sparse data. We propose a self-supervised point cloud upsampling network, named SPU-Net, to capture the inherent upsampling patterns of points lying on the underlying object surface. We conduct various experiments on both synthetic and real-scanned datasets, and the results demonstrate that we achieve comparable performance to the state-of-the-art supervised methods.
arXiv Detail & Related papers (2020-12-08T14:14:09Z)
DHOG: Deep Hierarchical Object Grouping [0.0]
We show that greedy or local methods of maximising mutual information (such as gradient optimisation) discover local optima of the mutual information criterion. We introduce deep hierarchical object grouping (DHOG) that computes a number distinct discrete representations of images in a hierarchical order. We find that these representations align better with the downstream task of grouping into underlying object classes.
arXiv Detail & Related papers (2020-03-13T14:11:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.