Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing
- URL: http://arxiv.org/abs/2103.04570v1
- Date: Mon, 8 Mar 2021 06:55:00 GMT
- Title: Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing
- Authors: Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc Van Gool
- Abstract summary: A new bottom-up regime is proposed to learn category-level human semantic segmentation and multi-person pose estimation in a joint and end-to-end manner.
It is a compact, efficient and powerful framework that exploits structural information over different human granularities.
Experiments on three instance-aware human datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.
- Score: 131.97475877877608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To address the challenging task of instance-aware human part parsing, a new
bottom-up regime is proposed to learn category-level human semantic
segmentation as well as multi-person pose estimation in a joint and end-to-end
manner. It is a compact, efficient and powerful framework that exploits
structural information over different human granularities and eases the
difficulty of person partitioning. Specifically, a dense-to-sparse projection
field, which allows explicitly associating dense human semantics with sparse
keypoints, is learnt and progressively improved over the network feature
pyramid for robustness. Then, the difficult pixel grouping problem is cast as
an easier, multi-person joint assembling task. By formulating joint association
as maximum-weight bipartite matching, a differentiable solution is developed to
exploit projected gradient descent and Dykstra's cyclic projection algorithm.
This makes our method end-to-end trainable and allows back-propagating the
grouping error to directly supervise multi-granularity human representation
learning. This is distinguished from current bottom-up human parsers or pose
estimators which require sophisticated post-processing or heuristic greedy
algorithms. Experiments on three instance-aware human parsing datasets show
that our model outperforms other bottom-up alternatives with much more
efficient inference.
Related papers
- Deep Learning for Human Parsing: A Survey [54.812353922568995]
We provide an analysis of state-of-the-art human parsing methods, covering a broad spectrum of pioneering works for semantic human parsing.
We introduce five insightful categories: (1) structure-driven architectures exploit the relationship of different human parts and the inherent hierarchical structure of a human body, (2) graph-based networks capture the global information to achieve an efficient and complete human body analysis, (3) context-aware networks explore useful contexts across all pixel to characterize a pixel of the corresponding class, and (4) LSTM-based methods can combine short-distance and long-distance spatial dependencies to better exploit abundant local and global contexts.
arXiv Detail & Related papers (2023-01-29T10:54:56Z) - Unsupervised Learning on 3D Point Clouds by Clustering and Contrasting [11.64827192421785]
unsupervised representation learning is a promising direction to auto-extract features without human intervention.
This paper proposes a general unsupervised approach, named textbfConClu, to perform the learning of point-wise and global features.
arXiv Detail & Related papers (2022-02-05T12:54:17Z) - End-to-end One-shot Human Parsing [91.5113227694443]
One-shot human parsing (OSHP) task requires parsing humans into an open set of classes defined by any test example.
End-to-end One-shot human Parsing Network (EOP-Net) proposed.
EOP-Net outperforms representative one-shot segmentation models by large margins.
arXiv Detail & Related papers (2021-05-04T01:35:50Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Differentiable Hierarchical Graph Grouping for Multi-Person Pose
Estimation [95.72606536493548]
Multi-person pose estimation is challenging because it localizes body keypoints for multiple persons simultaneously.
We propose a novel differentiable Hierarchical Graph Grouping (HGG) method to learn the graph grouping in bottom-up multi-person pose estimation task.
arXiv Detail & Related papers (2020-07-23T08:46:22Z) - Hierarchical Human Parsing with Typed Part-Relation Reasoning [179.64978033077222]
How to model human structures is the central theme in this task.
We seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures.
arXiv Detail & Related papers (2020-03-10T16:45:41Z) - Focus on Semantic Consistency for Cross-domain Crowd Understanding [34.560447389853614]
Some domain adaptation algorithms try to liberate it by training models with synthetic data.
We found that a mass of estimation errors in the background areas impede the performance of the existing methods.
In this paper, we propose a domain adaptation method to eliminate it.
arXiv Detail & Related papers (2020-02-20T08:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.