CDGNet: Class Distribution Guided Network for Human Parsing
- URL: http://arxiv.org/abs/2111.14173v1
- Date: Sun, 28 Nov 2021 15:18:53 GMT
- Title: CDGNet: Class Distribution Guided Network for Human Parsing
- Authors: Kunliang Liu, Ouk Choi, Jianming Wang, Wonjun Hwang
- Abstract summary: We make instance class distributions by accumulating the original human parsing label in the horizontal and vertical directions.
We combine two guided features to form a spatial guidance map, which is then superimposed onto the baseline network by multiplication and concatenation.
- Score: 7.779985252025487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of human parsing is to partition a human in an image into
constituent parts. This task involves labeling each pixel of the human image
according to the classes. Since the human body comprises hierarchically
structured parts, each body part of an image can have its sole position
distribution characteristics. Probably, a human head is less likely to be under
the feet, and arms are more likely to be near the torso. Inspired by this
observation, we make instance class distributions by accumulating the original
human parsing label in the horizontal and vertical directions, which can be
utilized as supervision signals. Using these horizontal and vertical class
distribution labels, the network is guided to exploit the intrinsic position
distribution of each class. We combine two guided features to form a spatial
guidance map, which is then superimposed onto the baseline network by
multiplication and concatenation to distinguish the human parts precisely. We
conducted extensive experiments to demonstrate the effectiveness and
superiority of our method on three well-known benchmarks: LIP, ATR, and CIHP
databases.
Related papers
- Understanding the Role of Pathways in a Deep Neural Network [4.456675543894722]
We analyze a convolutional neural network (CNN) trained in the classification task and present an algorithm to extract the diffusion pathways of individual pixels.
We find that the few largest pathways of an individual pixel from an image tend to cross the feature maps in each layer that is important for classification.
arXiv Detail & Related papers (2024-02-28T07:53:19Z) - Cross-view and Cross-pose Completion for 3D Human Understanding [22.787947086152315]
We propose a pre-training approach based on self-supervised learning that works on human-centric data using only images.
We pre-train a model for body-centric tasks and one for hand-centric tasks.
With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks.
arXiv Detail & Related papers (2023-11-15T16:51:18Z) - Semantic Human Parsing via Scalable Semantic Transfer over Multiple
Label Domains [25.083197183341007]
This paper presents a novel training paradigm to train a powerful human parsing network.
Two common application scenarios are addressed, termed universal parsing and dedicated parsing.
Experimental results demonstrate SST can effectively achieve promising universal human parsing performance.
arXiv Detail & Related papers (2023-04-09T02:44:29Z) - Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels [54.58539616385138]
We introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA)
First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features.
Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features.
arXiv Detail & Related papers (2022-09-27T15:50:31Z) - KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D
Correspondences [77.56222946832237]
We present a novel framework to detect the densepose of multiple people in an image.
The proposed method, which we refer to Knowledge Transfer Network (KTN), tackles two main problems.
It simultaneously maintains feature resolution and suppresses background pixels, and this strategy results in substantial increase in accuracy.
arXiv Detail & Related papers (2022-06-21T03:11:37Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences [60.89437526374286]
Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts.
We propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels.
Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space.
arXiv Detail & Related papers (2021-03-29T12:43:44Z) - Liquid Warping GAN with Attention: A Unified Framework for Human Image
Synthesis [58.05389586712485]
We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis.
In this paper, we propose a 3D body mesh recovery module to disentangle the pose and shape.
We also build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.
arXiv Detail & Related papers (2020-11-18T02:57:47Z) - Identity-Guided Human Semantic Parsing for Person Re-Identification [42.705908907250986]
We propose the identity-guided human semantic parsing approach (ISP) to locate both the human body parts and personal belongings at pixel-level for aligned person re-ID.
arXiv Detail & Related papers (2020-07-27T12:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.