A Global to Local Double Embedding Method for Multi-person Pose
Estimation
- URL: http://arxiv.org/abs/2102.07318v2
- Date: Tue, 16 Feb 2021 03:55:59 GMT
- Title: A Global to Local Double Embedding Method for Multi-person Pose
Estimation
- Authors: Yiming Xu, Jiaxin Li, Yiheng Peng, Yan Ding and Hua-Liang Wei
- Abstract summary: We present a novel method to simplify the pipeline by implementing person detection and joints detection simultaneously.
We propose a Double Embedding (DE) method to complete the multi-person pose estimation task in a global-to-local way.
We achieve the competitive results on benchmarks MSCOCO, MPII and CrowdPose, demonstrating the effectiveness and generalization ability of our method.
- Score: 10.05687757555923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-person pose estimation is a fundamental and challenging problem to many
computer vision tasks. Most existing methods can be broadly categorized into
two classes: top-down and bottom-up methods. Both of the two types of methods
involve two stages, namely, person detection and joints detection.
Conventionally, the two stages are implemented separately without considering
their interactions between them, and this may inevitably cause some issue
intrinsically. In this paper, we present a novel method to simplify the
pipeline by implementing person detection and joints detection simultaneously.
We propose a Double Embedding (DE) method to complete the multi-person pose
estimation task in a global-to-local way. DE consists of Global Embedding (GE)
and Local Embedding (LE). GE encodes different person instances and processes
information covering the whole image and LE encodes the local limbs
information. GE functions for the person detection in top-down strategy while
LE connects the rest joints sequentially which functions for joint grouping and
information processing in A bottom-up strategy. Based on LE, we design the
Mutual Refine Machine (MRM) to reduce the prediction difficulty in complex
scenarios. MRM can effectively realize the information communicating between
keypoints and further improve the accuracy. We achieve the competitive results
on benchmarks MSCOCO, MPII and CrowdPose, demonstrating the effectiveness and
generalization ability of our method.
Related papers
- Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation [24.973118696495977]
This paper presents a novel end-to-end framework withExplicit box Detection for multi-person Pose estimation, called ED-Pose.
It unifies the contextual learning between human-level (global) and keypoint-level (local) information.
For the first time, as a fully end-to-end framework with a L1 regression loss, ED-Pose surpasses heatmap-based Top-down methods under the same backbone.
arXiv Detail & Related papers (2023-02-03T08:18:34Z) - FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection [4.534713782093219]
A novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems.
FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM)
arXiv Detail & Related papers (2023-01-08T03:53:50Z) - Motor Imagery Decoding Using Ensemble Curriculum Learning and
Collaborative Training [11.157243900163376]
Multi-subject EEG datasets present several kinds of domain shifts.
These domain shifts impede robust cross-subject generalization.
We propose a two-stage model ensemble architecture built with multiple feature extractors.
We demonstrate that our model ensembling approach combines the powers of curriculum learning and collaborative training.
arXiv Detail & Related papers (2022-11-21T13:45:44Z) - Joint Multi-Person Body Detection and Orientation Estimation via One
Unified Embedding [24.96237908232171]
We propose a single-stage end-to-end trainable framework for tackling the HBOE problem with multi-persons.
By integrating the prediction of bounding boxes and direction angles in one embedding, our method can jointly estimate the location and orientation of all bodies in one image.
arXiv Detail & Related papers (2022-10-27T16:22:50Z) - DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation [24.082220581799156]
We propose a novel Dual-Pipeline Integrated Transformer (DPIT) for human pose estimation.
DPIT consists of two branches, the bottom-up branch deals with the whole image to capture the global visual information.
The extracted feature representations from bottom-up and top-down branches are fed into the transformer encoder to fuse the global and local knowledge interactively.
arXiv Detail & Related papers (2022-09-02T10:18:26Z) - Graph Convolutional Module for Temporal Action Localization in Videos [142.5947904572949]
We claim that the relations between action units play an important role in action localization.
A more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it.
We propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods.
arXiv Detail & Related papers (2021-12-01T06:36:59Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.