Clicking Matters:Towards Interactive Human Parsing
- URL: http://arxiv.org/abs/2111.06162v1
- Date: Thu, 11 Nov 2021 11:47:53 GMT
- Title: Clicking Matters:Towards Interactive Human Parsing
- Authors: Yutong Gao, Liqian Liang, Congyan Lang, Songhe Feng, Yidong Li,
Yunchao Wei
- Abstract summary: This work is the first attempt to tackle the human parsing task under the interactive setting.
Our IHP solution achieves 85% mIoU on the benchmark LIP, 80% mIoU on PASCAL-Person-Part and CIHP, 75% mIoU on Helen with only 1.95, 3.02, 2.84 and 1.09 clicks per class respectively.
- Score: 60.35351491254932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we focus on Interactive Human Parsing (IHP), which aims to
segment a human image into multiple human body parts with guidance from users'
interactions. This new task inherits the class-aware property of human parsing,
which cannot be well solved by traditional interactive image segmentation
approaches that are generally class-agnostic. To tackle this new task, we first
exploit user clicks to identify different human parts in the given image. These
clicks are subsequently transformed into semantic-aware localization maps,
which are concatenated with the RGB image to form the input of the segmentation
network and generate the initial parsing result. To enable the network to
better perceive user's purpose during the correction process, we investigate
several principal ways for the refinement, and reveal that
random-sampling-based click augmentation is the best way for promoting the
correction effectiveness. Furthermore, we also propose a semantic-perceiving
loss (SP-loss) to augment the training, which can effectively exploit the
semantic relationships of clicks for better optimization. To the best
knowledge, this work is the first attempt to tackle the human parsing task
under the interactive setting. Our IHP solution achieves 85\% mIoU on the
benchmark LIP, 80\% mIoU on PASCAL-Person-Part and CIHP, 75\% mIoU on Helen
with only 1.95, 3.02, 2.84 and 1.09 clicks per class respectively. These
results demonstrate that we can simply acquire high-quality human parsing masks
with only a few human effort. We hope this work can motivate more researchers
to develop data-efficient solutions to IHP in the future.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - CorrMatch: Label Propagation via Correlation Matching for
Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch.
We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information.
We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more.
Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z) - ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for
Human-Object Interaction Detection [20.983998911754792]
Two-stage Human-Object Interaction (HOI) detectors suffer from lower performance than one-stage methods.
We propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems.
ViPLO achieves the state-of-the-art results on two public benchmarks.
arXiv Detail & Related papers (2023-04-17T09:44:54Z) - Interactive segmentation using U-Net with weight map and dynamic user
interactions [0.0]
We propose a novel interactive segmentation framework, where user clicks are dynamically adapted in size based on the current segmentation mask.
The clicked regions form a weight map and are fed to a deep neural network as a novel weighted loss function.
Applying dynamic user click sizes increases the overall accuracy by 5.60% and 10.39% respectively by utilizing only a single user interaction.
arXiv Detail & Related papers (2021-11-18T15:08:11Z) - FAIRS -- Soft Focus Generator and Attention for Robust Object
Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks.
We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z) - Learning Attentive Pairwise Interaction for Fine-Grained Classification [53.66543841939087]
We propose a simple but effective Attentive Pairwise Interaction Network (API-Net) for fine-grained classification.
API-Net first learns a mutual feature vector to capture semantic differences in the input pair.
It then compares this mutual vector with individual vectors to generate gates for each input image.
We conduct extensive experiments on five popular benchmarks in fine-grained classification.
arXiv Detail & Related papers (2020-02-24T12:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.