Multi-scale Semantic Correlation Mining for Visible-Infrared Person
Re-Identification
- URL: http://arxiv.org/abs/2311.14395v1
- Date: Fri, 24 Nov 2023 10:23:57 GMT
- Title: Multi-scale Semantic Correlation Mining for Visible-Infrared Person
Re-Identification
- Authors: Ke Cheng, Xuecheng Hua, Hu Lu, Juanjuan Tu, Yuanquan Wang, Shitong
Wang
- Abstract summary: MSCMNet is proposed to comprehensively exploit semantic features at multiple scales.
It simultaneously reduces modality information loss as small as possible in feature extraction.
Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy.
- Score: 19.49945790485511
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID)
task lies in how to extract discriminative features from different modalities
for matching purposes. While the existing well works primarily focus on
minimizing the modal discrepancies, the modality information can not thoroughly
be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining
network (MSCMNet) is proposed to comprehensively exploit semantic features at
multiple scales and simultaneously reduce modality information loss as small as
possible in feature extraction. The proposed network contains three novel
components. Firstly, after taking into account the effective utilization of
modality information, the Multi-scale Information Correlation Mining Block
(MIMB) is designed to explore semantic correlations across multiple scales.
Secondly, in order to enrich the semantic information that MIMB can utilize, a
quadruple-stream feature extractor (QFE) with non-shared parameters is
specifically designed to extract information from different dimensions of the
dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed
to address the information discrepancy in the comprehensive features. Extensive
experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the
proposed MSCMNet achieves the greatest accuracy.
Related papers
- WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification [8.88666439137662]
We introduce the Wide-Ranging Information Mining Network (WRIM-Net), which mainly comprises a Multi-dimension Interactive Information Mining (MIIM) module and an Auxiliary-Information-based Contrastive Learning (AICL) approach.
Thanks to the low computational complexity design, separate MIIM can be positioned in shallow layers, enabling the network to better mine specific-modality multi-dimension information.
We conduct extensive experiments not only on the well-known SYSU-MM01 and RegDB datasets but also on the latest large-scale cross-modality LLCM dataset.
arXiv Detail & Related papers (2024-08-20T08:06:16Z) - Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events [29.86323896541765]
In large-scale disaster events, the planning of optimal rescue routes depends on the object detection ability at the disaster scene.
Existing methods, which are typically based on the RGB modality, struggle to distinguish targets with similar colors and textures in crowded environments.
We propose a multimodal collaboration network for dense and occluded vehicle detection, MuDet.
arXiv Detail & Related papers (2024-05-14T00:51:15Z) - Multimodal Informative ViT: Information Aggregation and Distribution for
Hyperspectral and LiDAR Classification [25.254816993934746]
Multimodal Informative Vit (MIVit) is a system with an innovative information aggregate-distributing mechanism.
MIVit reduces redundancy in the empirical distribution of each modality's separate and fused features.
Our results show that MIVit's bidirectional aggregate-distributing mechanism is highly effective.
arXiv Detail & Related papers (2024-01-06T09:53:33Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - ESDMR-Net: A Lightweight Network With Expand-Squeeze and Dual Multiscale
Residual Connections for Medical Image Segmentation [7.921517156237902]
This paper presents an expand-squeeze dual multiscale residual network ( ESDMR-Net)
It is a fully convolutional network that is well-suited for resource-constrained computing hardware such as mobile devices.
We present experiments on seven datasets from five distinct examples of applications.
arXiv Detail & Related papers (2023-12-17T02:15:49Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Deep feature selection-and-fusion for RGB-D semantic segmentation [8.831857715361624]
This work proposes a unified and efficient feature selectionand-fusion network (FSFNet)
FSFNet contains a symmetric cross-modality residual fusion module used for explicit fusion of multi-modality information.
Compared with the state-of-the-art methods, experimental evaluations demonstrate that the proposed model achieves competitive performance on two public datasets.
arXiv Detail & Related papers (2021-05-10T04:02:32Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.