Dynamic Enhancement Network for Partial Multi-modality Person
Re-identification
- URL: http://arxiv.org/abs/2305.15762v1
- Date: Thu, 25 May 2023 06:22:01 GMT
- Title: Dynamic Enhancement Network for Partial Multi-modality Person
Re-identification
- Authors: Aihua Zheng, Ziling He, Zi Wang, Chenglong Li, Jin Tang
- Abstract summary: We design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities.
Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner.
- Score: 52.70235136651996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing multi-modality studies are based on the assumption of modality
integrity. However, the problem of missing arbitrary modalities is very common
in real life, and this problem is less studied, but actually important in the
task of multi-modality person re-identification (Re-ID). To this end, we design
a novel dynamic enhancement network (DENet), which allows missing arbitrary
modalities while maintaining the representation ability of multiple modalities,
for partial multi-modality person Re-ID. To be specific, the multi-modal
representation of the RGB, near-infrared (NIR) and thermal-infrared (TIR)
images is learned by three branches, in which the information of missing
modalities is recovered by the feature transformation module. Since the missing
state might be changeable, we design a dynamic enhancement module, which
dynamically enhances modality features according to the missing state in an
adaptive manner, to improve the multi-modality representation. Extensive
experiments on multi-modality person Re-ID dataset RGBNT201 and vehicle Re-ID
dataset RGBNT100 comparing to the state-of-the-art methods verify the
effectiveness of our method in complex and changeable environments.
Related papers
- All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification [64.36210786350568]
We propose a novel learning framework named textbfEDITOR to select diverse tokens from vision Transformers for multi-modal object ReID.
Our framework can generate more discriminative features for multi-modal object ReID.
arXiv Detail & Related papers (2024-03-15T12:44:35Z) - Modality Unifying Network for Visible-Infrared Person Re-Identification [24.186989535051623]
Visible-infrared person re-identification (VI-ReID) is a challenging task due to large cross-modality discrepancies and intra-class variations.
Existing methods mainly focus on learning modality-shared representations by embedding different modalities into the same feature space.
We propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID.
arXiv Detail & Related papers (2023-09-12T14:22:22Z) - Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
Corrupted Multimodal Data [10.816003787786766]
Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras.
State-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy.
We propose an efficient model for multimodal V-I ReID that preserves modality-specific knowledge for improved robustness to corrupted multimodal images.
arXiv Detail & Related papers (2023-04-29T18:18:59Z) - Learning Multimodal Data Augmentation in Feature Space [65.54623807628536]
LeMDA is an easy-to-use method that automatically learns to jointly augment multimodal data in feature space.
We show that LeMDA can profoundly improve the performance of multimodal deep learning architectures.
arXiv Detail & Related papers (2022-12-29T20:39:36Z) - Multimodal Data Augmentation for Visual-Infrared Person ReID with
Corrupted Data [10.816003787786766]
We propose a specialized DA strategy for V-I person ReID models.
Our strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models.
Results indicate that using our strategy, V-I ReID models can exploit both shared and individual modality knowledge.
arXiv Detail & Related papers (2022-11-22T00:29:55Z) - CMTR: Cross-modality Transformer for Visible-infrared Person
Re-identification [38.96033760300123]
Cross-modality transformer-based method (CMTR) for visible-infrared person re-identification task.
We design the novel modality embeddings, which are fused with token embeddings to encode modalities' information.
Our proposed CMTR model's performance significantly surpasses existing outstanding CNN-based methods.
arXiv Detail & Related papers (2021-10-18T03:12:59Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.