Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing
- URL: http://arxiv.org/abs/2509.10093v1
- Date: Fri, 12 Sep 2025 09:36:23 GMT
- Title: Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing
- Authors: Laura Bragagnolo, Matteo Terreran, Leonardo Barcellona, Stefano Ghidoni,
- Abstract summary: Multi-human parsing is the task of segmenting human body parts while associating each part to the person it belongs to.<n>We propose a novel training framework exploiting multi-view information to improve multi-human parsing models under occlusions.
- Score: 7.013740268460309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-human parsing is the task of segmenting human body parts while associating each part to the person it belongs to, combining instance-level and part-level information for fine-grained human understanding. In this work, we demonstrate that, while state-of-the-art approaches achieved notable results on public datasets, they struggle considerably in segmenting people with overlapping bodies. From the intuition that overlapping people may appear separated from a different point of view, we propose a novel training framework exploiting multi-view information to improve multi-human parsing models under occlusions. Our method integrates such knowledge during the training process, introducing a novel approach based on weak supervision on human instances and a multi-view consistency loss. Given the lack of suitable datasets in the literature, we propose a semi-automatic annotation strategy to generate human instance segmentation masks from multi-view RGB+D data and 3D human skeletons. The experiments demonstrate that the approach can achieve up to a 4.20\% relative improvement on human parsing over the baseline model in occlusion scenarios.
Related papers
- HumanCrafter: Synergizing Generalizable Human Reconstruction and Semantic 3D Segmentation [51.27178551863772]
We propose a unified framework that enables the joint modeling of appearance and human-part semantics from a single image.<n>HumanCrafter surpasses existing state-of-the-art methods in both 3D human-part segmentation and 3D human reconstruction from a single image.
arXiv Detail & Related papers (2025-11-01T09:29:36Z) - Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis.<n> Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation.<n>Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Deep Learning for Human Parsing: A Survey [54.812353922568995]
We provide an analysis of state-of-the-art human parsing methods, covering a broad spectrum of pioneering works for semantic human parsing.
We introduce five insightful categories: (1) structure-driven architectures exploit the relationship of different human parts and the inherent hierarchical structure of a human body, (2) graph-based networks capture the global information to achieve an efficient and complete human body analysis, (3) context-aware networks explore useful contexts across all pixel to characterize a pixel of the corresponding class, and (4) LSTM-based methods can combine short-distance and long-distance spatial dependencies to better exploit abundant local and global contexts.
arXiv Detail & Related papers (2023-01-29T10:54:56Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - View-Invariant Skeleton-based Action Recognition via Global-Local
Contrastive Learning [15.271862140292837]
We propose a new view-invariant representation learning approach, without any manual action labeling, for skeleton-based human action recognition.
We leverage the multi-view skeleton data simultaneously taken for the same person in the network training, by maximizing the mutual information between the representations extracted from different views.
We show that the proposed method is robust to the view difference of the input skeleton data and significantly boosts the performance of unsupervised skeleton-based human action methods.
arXiv Detail & Related papers (2022-09-23T15:00:57Z) - Human De-occlusion: Invisible Perception and Recovery for Humans [26.404444296924243]
We tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans.
In particular, a two-stage framework is proposed to estimate the invisible portions and recover the content inside.
Our method performs over the state-of-the-art techniques in both tasks of mask completion and content recovery.
arXiv Detail & Related papers (2021-03-22T05:54:58Z) - Differentiable Multi-Granularity Human Representation Learning for
Instance-Aware Human Semantic Parsing [131.97475877877608]
A new bottom-up regime is proposed to learn category-level human semantic segmentation and multi-person pose estimation in a joint and end-to-end manner.
It is a compact, efficient and powerful framework that exploits structural information over different human granularities.
Experiments on three instance-aware human datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.
arXiv Detail & Related papers (2021-03-08T06:55:00Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.