Dynamic Patch-aware Enrichment Transformer for Occluded Person
Re-Identification
- URL: http://arxiv.org/abs/2402.10435v1
- Date: Fri, 16 Feb 2024 03:53:30 GMT
- Title: Dynamic Patch-aware Enrichment Transformer for Occluded Person
Re-Identification
- Authors: Xin Zhang, Keren Fu, and Qijun Zhao
- Abstract summary: We present an end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer)
This model effectively distinguishes human body information from occlusions automatically and dynamically.
To ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy.
- Score: 14.219232629274186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person re-identification (re-ID) continues to pose a significant challenge,
particularly in scenarios involving occlusions. Prior approaches aimed at
tackling occlusions have predominantly focused on aligning physical body
features through the utilization of external semantic cues. However, these
methods tend to be intricate and susceptible to noise. To address the
aforementioned challenges, we present an innovative end-to-end solution known
as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model
effectively distinguishes human body information from occlusions automatically
and dynamically, eliminating the need for external detectors or precise image
alignment. Specifically, we introduce a dynamic patch token selection module
(DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify
informative occlusion-free tokens. These tokens are then selected for deriving
subsequent local part features. To facilitate the seamless integration of
global classification features with the finely detailed local features selected
by DPSM, we introduce a novel feature blending module (FBM). FBM enhances
feature representation through the complementary nature of information and the
exploitation of part diversity. Furthermore, to ensure that DPSM and the entire
DPEFormer can effectively learn with only identity labels, we also propose a
Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the
recent advances in the Segment Anything Model (SAM). As a result, it generates
occlusion images that closely resemble real-world occlusions, greatly enhancing
the subsequent contrastive learning process. Experiments on occluded and
holistic re-ID benchmarks signify a substantial advancement of DPEFormer over
existing state-of-the-art approaches. The code will be made publicly available.
Related papers
- ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification [34.38227097059117]
We propose a Prompt-guided Feature Disentangling method (ProFD) to generate well-aligned part features.
ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding.
We employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting.
arXiv Detail & Related papers (2024-09-30T08:31:14Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Robust Ensemble Person Re-Identification via Orthogonal Fusion with Occlusion Handling [4.431087385310259]
Occlusion remains one of the major challenges in person reidentification (ReID)
We propose a deep ensemble model that harnesses both CNN and Transformer architectures to generate robust feature representations.
arXiv Detail & Related papers (2024-03-29T18:38:59Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Feature Completion Transformer for Occluded Person Re-identification [25.159974510754992]
Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders.
We propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space.
FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
arXiv Detail & Related papers (2023-03-03T01:12:57Z) - Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person
Re-identification [78.08536797239893]
We propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules.
MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips.
We show that MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
arXiv Detail & Related papers (2023-01-02T05:17:31Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Dual-Refinement: Joint Label and Feature Refinement for Unsupervised
Domain Adaptive Person Re-Identification [51.98150752331922]
Unsupervised domain adaptive (UDA) person re-identification (re-ID) is a challenging task due to the missing of labels for the target domain data.
We propose a novel approach, called Dual-Refinement, that jointly refines pseudo labels at the off-line clustering phase and features at the on-line training phase.
Our method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-12-26T07:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.