Single-stage Multi-human Parsing via Point Sets and Center-based Offsets
- URL: http://arxiv.org/abs/2304.11356v1
- Date: Sat, 22 Apr 2023 09:30:50 GMT
- Title: Single-stage Multi-human Parsing via Point Sets and Center-based Offsets
- Authors: Jiaming Chu, Lei Jin, Junliang Xing and Jian Zhao
- Abstract summary: We present a high-performance Single-stage Multi-human Parsing architecture that decouples the multi-human parsing problem into two fine-grained sub-problems.
The proposed method requires fewer training epochs and a less complex model architecture.
In particular, the proposed method requires fewer training epochs and a less complex model architecture.
- Score: 28.70266615856546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work studies the multi-human parsing problem. Existing methods, either
following top-down or bottom-up two-stage paradigms, usually involve expensive
computational costs. We instead present a high-performance Single-stage
Multi-human Parsing (SMP) deep architecture that decouples the multi-human
parsing problem into two fine-grained sub-problems, i.e., locating the human
body and parts. SMP leverages the point features in the barycenter positions to
obtain their segmentation and then generates a series of offsets from the
barycenter of the human body to the barycenters of parts, thus performing human
body and parts matching without the grouping process. Within the SMP
architecture, we propose a Refined Feature Retain module to extract the global
feature of instances through generated mask attention and a Mask of Interest
Reclassify module as a trainable plug-in module to refine the classification
results with the predicted segmentation. Extensive experiments on the MHPv2.0
dataset demonstrate the best effectiveness and efficiency of the proposed
method, surpassing the state-of-the-art method by 2.1% in AP50p, 1.0% in
APvolp, and 1.2% in PCP50. In particular, the proposed method requires fewer
training epochs and a less complex model architecture. We will release our
source codes, pretrained models, and online demos to facilitate further
studies.
Related papers
- AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Subject-Independent Deep Architecture for EEG-based Motor Imagery
Classification [0.5439020425819]
Motor EEG (MI) classification based on electroencephalogram (EEG) is a widely-used technique in non-invasive brain-computer interface (BCI) systems.
We propose a novel subject-independent semi-supervised deep architecture (SSDA)
The proposed SSDA consists of two parts: an unsupervised and a supervised element.
arXiv Detail & Related papers (2024-01-27T23:05:51Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP.
MP performs a linear classification head based on the mean of final features.
Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z) - Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation [4.24515544235173]
We propose a convolutional neural network for bottom-up human pose estimation.
Our model has the ability to focus on different granularity from local to global regions.
Our results on the COCO and CrowdPose datasets demonstrate that it is an efficient framework for multi-person pose estimation.
arXiv Detail & Related papers (2023-03-27T02:54:08Z) - MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [27.849059115252008]
We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints.
Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
arXiv Detail & Related papers (2023-02-17T08:29:33Z) - Back to MLP: A Simple Baseline for Human Motion Prediction [59.18776744541904]
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
We show that the performance of these approaches can be surpassed by a light-weight and purely architectural architecture with only 0.14M parameters.
An exhaustive evaluation on Human3.6M, AMASS and 3DPW datasets shows that our method, which we dub siMLPe, consistently outperforms all other approaches.
arXiv Detail & Related papers (2022-07-04T16:35:58Z) - I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose
Estimation [30.204633647947293]
We present the Intra- and Inter-Human Relation Networks (I2R-Net) for Multi-Person Pose Estimation.
First, the Intra-Human Relation Module operates on a single person and aims to capture Intra-Human dependencies.
Second, the Inter-Human Relation Module considers the relation between multiple instances and focuses on capturing Inter-Human interactions.
arXiv Detail & Related papers (2022-06-22T07:44:41Z) - Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing [131.97475877877608]
A new bottom-up regime is proposed to learn category-level human semantic segmentation and multi-person pose estimation in a joint and end-to-end manner.
It is a compact, efficient and powerful framework that exploits structural information over different human granularities.
Experiments on three instance-aware human datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.
arXiv Detail & Related papers (2021-03-08T06:55:00Z) - A Global to Local Double Embedding Method for Multi-person Pose
Estimation [10.05687757555923]
We present a novel method to simplify the pipeline by implementing person detection and joints detection simultaneously.
We propose a Double Embedding (DE) method to complete the multi-person pose estimation task in a global-to-local way.
We achieve the competitive results on benchmarks MSCOCO, MPII and CrowdPose, demonstrating the effectiveness and generalization ability of our method.
arXiv Detail & Related papers (2021-02-15T03:13:38Z) - Multi-task Learning with Coarse Priors for Robust Part-aware Person
Re-identification [79.33809815035127]
The Multi-task Part-aware Network (MPN) is designed to extract semantically aligned part-level features from pedestrian images.
MPN solves the body part misalignment problem via multi-task learning (MTL) in the training stage.
MPN consistently outperforms state-of-the-art approaches by significant margins.
arXiv Detail & Related papers (2020-03-18T07:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.