TSDW: A Tri-Stream Dynamic Weight Network for Cloth-Changing Person Re-Identification
- URL: http://arxiv.org/abs/2503.00477v1
- Date: Sat, 01 Mar 2025 13:04:49 GMT
- Title: TSDW: A Tri-Stream Dynamic Weight Network for Cloth-Changing Person Re-Identification
- Authors: Ruiqi He, Zihan Wang, Xiang Zhou,
- Abstract summary: Cloth-Changing Person Re-identification aims to solve the challenge of identifying individuals across different temporal-spatial scenarios.<n>Existing ReID research primarily relies on face recognition, semantic recognition, and clothing-irrelevant feature identification.<n>We propose a Tri-Stream Dynamic Weight Network (TSDW) that requires only images.
- Score: 10.51699935302901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cloth-Changing Person Re-identification (CC-ReID) aims to solve the challenge of identifying individuals across different temporal-spatial scenarios, viewpoints, and clothing variations. This field is gaining increasing attention in big data research and public security domains. Existing ReID research primarily relies on face recognition, gait semantic recognition, and clothing-irrelevant feature identification, which perform relatively well in scenarios with high-quality clothing change videos and images. However, these approaches depend on either single features or simple combinations of multiple features, making further performance improvements difficult. Additionally, limitations such as missing facial information, challenges in gait extraction, and inconsistent camera parameters restrict the broader application of CC-ReID. To address the above limitations, we innovatively propose a Tri-Stream Dynamic Weight Network (TSDW) that requires only images. This dynamic weighting network consists of three parallel feature streams: facial features, head-limb features, and global features. Each stream specializes in extracting its designated features, after which a gating network dynamically fuses confidence levels. The three parallel feature streams enhance recognition performance and reduce the impact of any single feature failure, thereby improving model robustness. Extensive experiments on benchmark datasets (e.g., PRCC, Celeb-reID, VC-Clothes) demonstrate that our method significantly outperforms existing state-of-the-art approaches.
Related papers
- FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation [42.980289787679084]
Person re-identification (ReID) plays a critical role in applications like security surveillance and criminal investigations by matching individuals across large image galleries captured by non-overlapping cameras.
Traditional ReID methods rely on unimodal inputs, typically images, but face limitations due to challenges like occlusions, lighting changes, and pose variations.
This paper presents FusionSegReID, a multimodal model that combines both image and text inputs for enhanced ReID performance.
arXiv Detail & Related papers (2025-03-27T15:14:03Z) - Cross-Modal Synergies: Unveiling the Potential of Motion-Aware Fusion Networks in Handling Dynamic and Static ReID Scenarios [4.635813517641097]
We introduce an innovative Motion-Aware Fusion (MOTAR-FUSE) network that utilizes motion cues derived from static imagery to significantly enhance ReID capabilities.<n>A unique aspect of our approach is the integration of a motion consistency task, which empowers the motion-aware transformer to adeptly capture the dynamics of human motion.
arXiv Detail & Related papers (2025-02-02T04:37:25Z) - PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification [73.64560354556498]
Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features.
We present PartFormer, an innovative adaptation of ViT designed to overcome the limitations in object Re-ID tasks.
Our framework significantly outperforms state-of-the-art by 2.4% mAP scores on the most challenging MSMT17 dataset.
arXiv Detail & Related papers (2024-08-29T16:31:05Z) - Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image [87.00660347447494]
Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering.
We propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks.
Our results, analyzed on DTU and EPFL, reveal that feature priors from image matching and multi-view stereo datasets outperform other pretext tasks.
arXiv Detail & Related papers (2024-08-04T16:09:46Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - Feature Disentanglement Learning with Switching and Aggregation for
Video-based Person Re-Identification [9.068045610800667]
In video person re-identification (Re-ID), the network must consistently extract features of the target person from successive frames.
Existing methods tend to focus only on how to use temporal information, which often leads to networks being fooled by similar appearances and same backgrounds.
We propose a Disentanglement and Switching and Aggregation Network (DSANet), which segregates the features representing identity and features based on camera characteristics, and pays more attention to ID information.
arXiv Detail & Related papers (2022-12-16T04:27:56Z) - Semantic Consistency and Identity Mapping Multi-Component Generative
Adversarial Network for Person Re-Identification [39.605062525247135]
We propose a semantic consistency and identity mapping multi-component generative adversarial network (SC-IMGAN) which provides style adaptation from one to many domains.
Our proposed method outperforms state-of-the-art techniques on six challenging person Re-ID datasets.
arXiv Detail & Related papers (2021-04-28T14:12:29Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Generalized Iris Presentation Attack Detection Algorithm under
Cross-Database Settings [63.90855798947425]
Presentation attacks pose major challenges to most of the biometric modalities.
We propose a generalized deep learning-based presentation attack detection network, MVANet.
It is inspired by the simplicity and success of hybrid algorithm or fusion of multiple detection networks.
arXiv Detail & Related papers (2020-10-25T22:42:27Z) - Challenge-Aware RGBT Tracking [32.88141817679821]
We propose a novel challenge-aware neural network to handle the modality-shared challenges and the modality-specific ones.
We show that our method operates at a real-time speed while performing well against the state-of-the-art methods on three benchmark datasets.
arXiv Detail & Related papers (2020-07-26T15:11:44Z) - Attribute-aware Identity-hard Triplet Loss for Video-based Person
Re-identification [51.110453988705395]
Video-based person re-identification (Re-ID) is an important computer vision task.
We introduce a new metric learning method called Attribute-aware Identity-hard Triplet Loss (AITL)
To achieve a complete model of video-based person Re-ID, a multi-task framework with Attribute-driven Spatio-Temporal Attention (ASTA) mechanism is also proposed.
arXiv Detail & Related papers (2020-06-13T09:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.