Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
Corrupted Multimodal Data
- URL: http://arxiv.org/abs/2305.00320v1
- Date: Sat, 29 Apr 2023 18:18:59 GMT
- Title: Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using
Corrupted Multimodal Data
- Authors: Arthur Josi, Mahdi Alehdaghi, Rafael M. O. Cruz, Eric Granger
- Abstract summary: Visible-infrared person re-identification (V-I ReID) seeks to match images of individuals captured over a distributed network of RGB and IR cameras.
State-of-art V-I ReID models cannot leverage corrupted modality information to sustain a high level of accuracy.
We propose an efficient model for multimodal V-I ReID that preserves modality-specific knowledge for improved robustness to corrupted multimodal images.
- Score: 10.816003787786766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visible-infrared person re-identification (V-I ReID) seeks to match images of
individuals captured over a distributed network of RGB and IR cameras. The task
is challenging due to the significant differences between V and I modalities,
especially under real-world conditions, where images are corrupted by, e.g,
blur, noise, and weather. Indeed, state-of-art V-I ReID models cannot leverage
corrupted modality information to sustain a high level of accuracy. In this
paper, we propose an efficient model for multimodal V-I ReID -- named
Multimodal Middle Stream Fusion (MMSF) -- that preserves modality-specific
knowledge for improved robustness to corrupted multimodal images. In addition,
three state-of-art attention-based multimodal fusion models are adapted to
address corrupted multimodal data in V-I ReID, allowing to dynamically balance
each modality importance. Recently, evaluation protocols have been proposed to
assess the robustness of ReID models under challenging real-world scenarios.
However, these protocols are limited to unimodal V settings. For realistic
evaluation of multimodal (and cross-modal) V-I person ReID models, we propose
new challenging corrupted datasets for scenarios where V and I cameras are
co-located (CL) and not co-located (NCL). Finally, the benefits of our Masking
and Local Multimodal Data Augmentation (ML-MDA) strategy are explored to
improve the robustness of ReID models to multimodal corruption. Our experiments
on clean and corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD
datasets indicate the multimodal V-I ReID models that are more likely to
perform well in real-world operational conditions. In particular, our ML-MDA is
an important strategy for a V-I person ReID system to sustain high accuracy and
robustness when processing corrupted multimodal images. Also, our multimodal
ReID model MMSF outperforms every method under CL and NCL camera scenarios.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification [11.324518300593983]
Current VI-ReID methods focus on cross-modality matching, but real-world applications often involve mixed galleries containing both V and I images.
This is because gallery images from the same modality may have lower domain gaps but correspond to different identities.
This paper introduces a novel mixed-modal ReID setting, where galleries contain data from both modalities.
arXiv Detail & Related papers (2025-01-23T01:28:05Z) - MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.
We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO)
AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning.
Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z) - Cross-Modality Perturbation Synergy Attack for Person Re-identification [66.48494594909123]
The main challenge in cross-modality ReID lies in effectively dealing with visual differences between different modalities.
Existing attack methods have primarily focused on the characteristics of the visible image modality.
This study proposes a universal perturbation attack specifically designed for cross-modality ReID.
arXiv Detail & Related papers (2024-01-18T15:56:23Z) - Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI
Super-resolution and Reconstruction [23.779641808300596]
We propose a multi-contrast convolutional dictionary (MC-CDic) model under the guidance of the optimization algorithm.
We employ the proximal gradient algorithm to optimize the model and unroll the iterative steps into a deep CDic model.
Experimental results demonstrate the superior performance of the proposed MC-CDic model against existing SOTA methods.
arXiv Detail & Related papers (2023-09-03T13:18:59Z) - Dynamic Enhancement Network for Partial Multi-modality Person
Re-identification [52.70235136651996]
We design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities.
Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner.
arXiv Detail & Related papers (2023-05-25T06:22:01Z) - FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing.
FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data.
Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z) - Multimodal Data Augmentation for Visual-Infrared Person ReID with
Corrupted Data [10.816003787786766]
We propose a specialized DA strategy for V-I person ReID models.
Our strategy allows to diminish the impact of corruption on the accuracy of deep person ReID models.
Results indicate that using our strategy, V-I ReID models can exploit both shared and individual modality knowledge.
arXiv Detail & Related papers (2022-11-22T00:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.