Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
Corrupted Multimodal Data
- URL: http://arxiv.org/abs/2305.00320v1
- Date: Sat, 29 Apr 2023 18:18:59 GMT
- Title: Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
Corrupted Multimodal Data
- Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
- Abstract summary: Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras.
State-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy.
We propose an efficient model for multimodal V-I ReID that preserves modality-specific knowledge for improved robustness to corrupted multimodal images.
- Score: 10.816003787786766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-infrared person re-identification (V-I ReID) seeks to match images of
individuals captured over a distributed network of RGB and IR cameras. The task
is challenging due to the significant differences between V and I modalities,
especially under real-world conditions, where images are corrupted by, e.g,
blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage
corrupted modality information to sustain a high level of accuracy. In this
paper, we propose an efficient model for multimodal V-I ReID -- named
Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific
knowledge for improved robustness to corrupted multimodal images. In addition,
three state-of-art attention-based multimodal fusion models are adapted to
address corrupted multimodal data in V-I ReID, allowing to dynamically balance
each modality importance. Recently, evaluation protocols have been proposed to
assess the robustness of ReID models under challenging real-world scenarios.
However, these protocols are limited to unimodal V settings. For realistic
evaluation of multimodal (and cross-modal) V-I person ReID models, we propose
new challenging corrupted datasets for scenarios where V and I cameras are
co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking
and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to
improve the robustness of ReID models to multimodal corruption. Our experiments
on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD
datasets indicate the multimodal V-I ReID models that are more likely to
perform well in real-world operational conditions. In particular, our ML-MDA is
an important strategy for a V-I person ReID system to sustain high accuracy and
robustness when processing corrupted multimodal images. Also, our multimodal
ReID model MMSF outperforms every method under CL and NCL camera scenarios.
Related papers
- RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR.
Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer.
Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z) - All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI
Super-resolution and Reconstruction [23.779641808300596]
We propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm.
We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model.
Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods.
arXiv Detail & Related papers (2023-09-03T13:18:59Z) - Dynamic Enhancement Network for Partial Multi-modality Person
Re-identification [52.70235136651996]
We design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities.
Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner.
arXiv Detail & Related papers (2023-05-25T06:22:01Z) - FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing.
FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data.
Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z) - Learning Progressive Modality-shared Transformers for Effective
Visible-Infrared Person Re-identification [27.75907274034702]
We propose a novel deep learning framework named Progressive Modality-shared Transformer (PMT) for effective VI-ReID.
To reduce the negative effect of modality gaps, we first take the gray-scale images as an auxiliary modality and propose a progressive learning strategy.
To cope with the problem of large intra-class differences and small inter-class differences, we propose a Discriminative Center Loss.
arXiv Detail & Related papers (2022-12-01T02:20:16Z) - Multimodal Data Augmentation for Visual-Infrared Person ReID with
Corrupted Data [10.816003787786766]
We propose a specialized DA strategy for V-I person ReID models.
Our strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models.
Results indicate that using our strategy, V-I ReID models can exploit both shared and individual modality knowledge.
arXiv Detail & Related papers (2022-11-22T00:29:55Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.