Deep Learning for Human Parsing: A Survey
- URL: http://arxiv.org/abs/2301.12416v1
- Date: Sun, 29 Jan 2023 10:54:56 GMT
- Title: Deep Learning for Human Parsing: A Survey
- Authors: Xiaomei Zhang, Xiangyu Zhu, Ming Tang, Zhen Lei
- Abstract summary: We provide an analysis of state-of-the-art human parsing methods, covering a broad spectrum of pioneering works for semantic human parsing.
We introduce five insightful categories: (1) structure-driven architectures exploit the relationship of different human parts and the inherent hierarchical structure of a human body, (2) graph-based networks capture the global information to achieve an efficient and complete human body analysis, (3) context-aware networks explore useful contexts across all pixel to characterize a pixel of the corresponding class, and (4) LSTM-based methods can combine short-distance and long-distance spatial dependencies to better exploit abundant local and global contexts.
- Score: 54.812353922568995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human parsing is a key topic in image processing with many applications, such
as surveillance analysis, human-robot interaction, person search, and clothing
category classification, among many others. Recently, due to the success of
deep learning in computer vision, there are a number of works aimed at
developing human parsing algorithms using deep learning models. As methods have
been proposed, a comprehensive survey of this topic is of great importance. In
this survey, we provide an analysis of state-of-the-art human parsing methods,
covering a broad spectrum of pioneering works for semantic human parsing. We
introduce five insightful categories: (1) structure-driven architectures
exploit the relationship of different human parts and the inherent hierarchical
structure of a human body, (2) graph-based networks capture the global
information to achieve an efficient and complete human body analysis, (3)
context-aware networks explore useful contexts across all pixel to characterize
a pixel of the corresponding class, (4) LSTM-based methods can combine
short-distance and long-distance spatial dependencies to better exploit
abundant local and global contexts, and (5) combined auxiliary information
approaches use related tasks or supervision to improve network performance. We
also discuss the advantages/disadvantages of the methods in each category and
the relationships between methods in different categories, examine the most
widely used datasets, report performances, and discuss promising future
research directions in this area.
Related papers
- Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL.
We propose a data repair algorithm to curate contextually fair data to reduce model bias.
We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Detecting Any Human-Object Interaction Relationship: Universal HOI
Detector with Spatial Prompt Learning on Foundation Models [55.20626448358655]
This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs)
Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image.
For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence.
arXiv Detail & Related papers (2023-11-07T08:27:32Z) - Deep Learning Technique for Human Parsing: A Survey and Outlook [5.236995853909988]
In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing.
We put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research.
We point out a set of under-investigated open issues in this field and suggest new directions for future study.
arXiv Detail & Related papers (2023-01-01T12:39:57Z) - A Skeleton-aware Graph Convolutional Network for Human-Object
Interaction Detection [14.900704382194013]
We propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI.
Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions.
It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs.
arXiv Detail & Related papers (2022-07-11T15:20:18Z) - 2D Human Pose Estimation: A Survey [16.56050212383859]
Human pose estimation aims at localizing human anatomical keypoints or body parts in the input data.
Deep learning techniques allow learning feature representations directly from the data.
In this paper, we reap the recent achievements of 2D human pose estimation methods and present a comprehensive survey.
arXiv Detail & Related papers (2022-04-15T08:09:43Z) - Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing [131.97475877877608]
A new bottom-up regime is proposed to learn category-level human semantic segmentation and multi-person pose estimation in a joint and end-to-end manner.
It is a compact, efficient and powerful framework that exploits structural information over different human granularities.
Experiments on three instance-aware human datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.
arXiv Detail & Related papers (2021-03-08T06:55:00Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Hierarchical Human Parsing with Typed Part-Relation Reasoning [179.64978033077222]
How to model human structures is the central theme in this task.
We seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures.
arXiv Detail & Related papers (2020-03-10T16:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.