RepParser: End-to-End Multiple Human Parsing with Representative Parts
- URL: http://arxiv.org/abs/2208.12908v1
- Date: Sat, 27 Aug 2022 02:22:24 GMT
- Title: RepParser: End-to-End Multiple Human Parsing with Representative Parts
- Authors: Xiaojia Chen, Xuanhan Wang, Lianli Gao, Jingkuan Song
- Abstract summary: We present an end-to-end multiple human parsing framework using representative parts, termed Rep.
Rep solves the multiple human parsing in a new single-stage manner without resorting to person detection or post-grouping.
- Score: 74.31841289680563
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing methods of multiple human parsing usually adopt a two-stage strategy
(typically top-down and bottom-up), which suffers from either strong dependence
on prior detection or highly computational redundancy during post-grouping. In
this work, we present an end-to-end multiple human parsing framework using
representative parts, termed RepParser. Different from mainstream methods,
RepParser solves the multiple human parsing in a new single-stage manner
without resorting to person detection or post-grouping.To this end, RepParser
decouples the parsing pipeline into instance-aware kernel generation and
part-aware human parsing, which are responsible for instance separation and
instance-specific part segmentation, respectively. In particular, we empower
the parsing pipeline by representative parts, since they are characterized by
instance-aware keypoints and can be utilized to dynamically parse each person
instance. Specifically, representative parts are obtained by jointly localizing
centers of instances and estimating keypoints of body part regions. After that,
we dynamically predict instance-aware convolution kernels through
representative parts, thus encoding person-part context into each kernel
responsible for casting an image feature as an instance-specific
representation.Furthermore, a multi-branch structure is adopted to divide each
instance-specific representation into several part-aware representations for
separate part segmentation.In this way, RepParser accordingly focuses on person
instances with the guidance of representative parts and directly outputs
parsing results for each person instance, thus eliminating the requirement of
the prior detection or post-grouping.Extensive experiments on two challenging
benchmarks demonstrate that our proposed RepParser is a simple yet effective
framework and achieves very competitive performance.
Related papers
- DROP: Decouple Re-Identification and Human Parsing with Task-specific
Features for Occluded Person Re-identification [15.910080319118498]
The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID)
Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, DROP argues that the inferior performance of the former is due to distinct requirements for ReID and human parsing features.
Experimental results highlight the efficacy of DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke, surpassing two mainstream methods.
arXiv Detail & Related papers (2024-01-31T17:54:43Z) - DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive
Segmentation Transformer [58.95404214273222]
Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth for training.
We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as-temporal queries.
Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image.
arXiv Detail & Related papers (2023-04-13T16:57:02Z) - Fine-grained Few-shot Recognition by Deep Object Parsing [43.61794876834115]
We parse a test instance by inferring the K parts, where each part occupies a distinct location in the feature space.
We recognize test instances by comparing its active templates and the relative geometry of its part locations.
arXiv Detail & Related papers (2022-07-14T17:59:05Z) - AIParsing: Anchor-free Instance-level Human Parsing [98.80740676794254]
We have designed an instance-level human parsing network which is anchor-free and solvable on a pixel level.
It consists of two simple sub-networks: an anchor-free detection head for bounding box predictions and an edge-guided parsing head for human segmentation.
Our method achieves the best global-level and instance-level performance over state-of-the-art one-stage top-down alternatives.
arXiv Detail & Related papers (2022-07-14T12:19:32Z) - Technical Report: Disentangled Action Parsing Networks for Accurate
Part-level Action Parsing [65.87931036949458]
Part-level Action Parsing aims at part state parsing for boosting action recognition in videos.
We present a simple yet effective approach, named disentangled action parsing (DAP)
arXiv Detail & Related papers (2021-11-05T02:29:32Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - Nondiscriminatory Treatment: a straightforward framework for multi-human
parsing [14.254424142949741]
Multi-human parsing aims to segment every body part of every human instance.
We present an end-to-end and box-free pipeline from a new and more human-intuitive perspective.
Experiments show that our network performs superiorly against state-of-the-art methods.
arXiv Detail & Related papers (2021-01-26T16:31:21Z) - Iterative Utterance Segmentation for Neural Semantic Parsing [38.344720207846905]
We present a novel framework for boosting neural semantic domains via iterative utterance segmentation.
One key advantage is that this framework does not require any handcraft utterance or additional labeled data for the segmenter.
On data that require compositional generalization, our framework brings significant accuracy: Geo 63.1 to 81.2, Formulas 59.7 to 72.7, ComplexWebQuestions 27.1 to 56.3.
arXiv Detail & Related papers (2020-12-13T09:46:24Z) - A Simple Global Neural Discourse Parser [61.728994693410954]
We propose a simple chart-based neural discourse that does not require any manually-crafted features and is based on learned span representations only.
We empirically demonstrate that our model achieves the best performance among globals, and comparable performance to state-of-art greedys.
arXiv Detail & Related papers (2020-09-02T19:28:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.