Hierarchical Human Parsing with Typed Part-Relation Reasoning
- URL: http://arxiv.org/abs/2003.04845v2
- Date: Wed, 11 Mar 2020 10:14:43 GMT
- Title: Hierarchical Human Parsing with Typed Part-Relation Reasoning
- Authors: Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, and
Ling Shao
- Abstract summary: How to model human structures is the central theme in this task.
We seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures.
- Score: 179.64978033077222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human parsing is for pixel-wise human semantic understanding. As human bodies
are underlying hierarchically structured, how to model human structures is the
central theme in this task. Focusing on this, we seek to simultaneously exploit
the representational capacity of deep graph networks and the hierarchical human
structures. In particular, we provide following two contributions. First, three
kinds of part relations, i.e., decomposition, composition, and dependency, are,
for the first time, completely and precisely described by three distinct
relation networks. This is in stark contrast to previous parsers, which only
focus on a portion of the relations and adopt a type-agnostic relation modeling
strategy. More expressive relation information can be captured by explicitly
imposing the parameters in the relation networks to satisfy the specific
characteristics of different relations. Second, previous parsers largely ignore
the need for an approximation algorithm over the loopy human hierarchy, while
we instead address an iterative reasoning process, by assimilating generic
message-passing networks with their edge-typed, convolutional counterparts.
With these efforts, our parser lays the foundation for more sophisticated and
flexible human relation patterns of reasoning. Comprehensive experiments on
five datasets demonstrate that our parser sets a new state-of-the-art on each.
Related papers
- Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - Deep Learning for Human Parsing: A Survey [54.812353922568995]
We provide an analysis of state-of-the-art human parsing methods, covering a broad spectrum of pioneering works for semantic human parsing.
We introduce five insightful categories: (1) structure-driven architectures exploit the relationship of different human parts and the inherent hierarchical structure of a human body, (2) graph-based networks capture the global information to achieve an efficient and complete human body analysis, (3) context-aware networks explore useful contexts across all pixel to characterize a pixel of the corresponding class, and (4) LSTM-based methods can combine short-distance and long-distance spatial dependencies to better exploit abundant local and global contexts.
arXiv Detail & Related papers (2023-01-29T10:54:56Z) - Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing [131.97475877877608]
A new bottom-up regime is proposed to learn category-level human semantic segmentation and multi-person pose estimation in a joint and end-to-end manner.
It is a compact, efficient and powerful framework that exploits structural information over different human granularities.
Experiments on three instance-aware human datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.
arXiv Detail & Related papers (2021-03-08T06:55:00Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Correlating Edge, Pose with Parsing [35.27973063976257]
This paper studies how human semantic boundaries and keypoint locations can jointly improve human parsing.
We propose a Correlation Parsing Machine (CorrPM) employing a heterogeneous non-local block to discover the spatial affinity among feature maps from the edge, pose and parsing.
arXiv Detail & Related papers (2020-05-04T12:39:13Z) - Learning Compositional Neural Information Fusion for Human Parsing [181.48380078517525]
We formulate the approach as a neural information fusion framework.
Our model assembles the information from three inference processes over the hierarchy.
The whole model is end-to-end differentiable, explicitly modeling information flows and structures.
arXiv Detail & Related papers (2020-01-19T10:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.