DMAT: A Dynamic Mask-Aware Transformer for Human De-occlusion
- URL: http://arxiv.org/abs/2402.04558v1
- Date: Wed, 7 Feb 2024 03:36:41 GMT
- Title: DMAT: A Dynamic Mask-Aware Transformer for Human De-occlusion
- Authors: Guoqiang Liang, Jiahao Hu, Qingyue Wang, Shizhou Zhang
- Abstract summary: Human de-occlusion aims to infer the appearance of invisible human parts from an occluded image.
This paper proposes a dynamic mask-aware transformer (DMAT), which dynamically augments information from human regions.
Experiments on the AHP dataset demonstrate its superior performance compared to recent state-of-the-art methods.
- Score: 5.901982216097867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human de-occlusion, which aims to infer the appearance of invisible human
parts from an occluded image, has great value in many human-related tasks, such
as person re-id, and intention inference. To address this task, this paper
proposes a dynamic mask-aware transformer (DMAT), which dynamically augments
information from human regions and weakens that from occlusion. First, to
enhance token representation, we design an expanded convolution head with
enlarged kernels, which captures more local valid context and mitigates the
influence of surrounding occlusion. To concentrate on the visible human parts,
we propose a novel dynamic multi-head human-mask guided attention mechanism
through integrating multiple masks, which can prevent the de-occluded regions
from assimilating to the background. Besides, a region upsampling strategy is
utilized to alleviate the impact of occlusion on interpolated images. During
model learning, an amodal loss is developed to further emphasize the recovery
effect of human regions, which also refines the model's convergence. Extensive
experiments on the AHP dataset demonstrate its superior performance compared to
recent state-of-the-art methods.
Related papers
- DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery.
It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z) - AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step.
We first employ a human token to probe a human location in the image and encode global features for each instance.
Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where [63.61248884015162]
We aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks.
We propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background.
arXiv Detail & Related papers (2023-09-22T09:58:38Z) - NeuralFusion: Neural Volumetric Rendering under Human-object
Interactions [46.70371238621842]
We propose a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors.
For geometry modeling, we propose a neural implicit inference scheme with non-rigid key-volume fusion.
We also introduce a layer-wise human-object texture rendering scheme, which combines volumetric and image-based rendering in both spatial and temporal domains.
arXiv Detail & Related papers (2022-02-25T17:10:07Z) - RobustFusion: Robust Volumetric Performance Reconstruction under
Human-object Interactions from Monocular RGBD Stream [27.600873320989276]
High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios.
Recent advances still fail to provide reliable performance reconstruction.
We propose RobustFusion, a robust volumetric performance reconstruction system for human-object interaction scenarios.
arXiv Detail & Related papers (2021-04-30T08:41:45Z) - Human De-occlusion: Invisible Perception and Recovery for Humans [26.404444296924243]
We tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans.
In particular, a two-stage framework is proposed to estimate the invisible portions and recover the content inside.
Our method performs over the state-of-the-art techniques in both tasks of mask completion and content recovery.
arXiv Detail & Related papers (2021-03-22T05:54:58Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.