Related papers: Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification

Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification

URL: http://arxiv.org/abs/2305.13659v1
Date: Tue, 23 May 2023 04:04:24 GMT
Title: Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Authors: Aihua Zheng, Zhiqi Ma, Zi Wang, Chenglong Li
Abstract summary: In harsh environments, the discnative cues in RGB and NIR modalities are often lost due to strong flares from vehicle lamps or sunlight. We propose a Flare-Aware Cross-modal Enhancement Network that adaptively restores flare-corrupted RGB and NIR features with guidance from the flareimmunized thermal infrared spectrum.
Score: 29.48387524901101
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-spectral vehicle re-identification aims to address the challenge of identifying vehicles in complex lighting conditions by incorporating complementary visible and infrared information. However, in harsh environments, the discriminative cues in RGB and NIR modalities are often lost due to strong flares from vehicle lamps or sunlight, and existing multi-modal fusion methods are limited in their ability to recover these important cues. To address this problem, we propose a Flare-Aware Cross-modal Enhancement Network that adaptively restores flare-corrupted RGB and NIR features with guidance from the flare-immunized thermal infrared spectrum. First, to reduce the influence of locally degraded appearance due to intense flare, we propose a Mutual Flare Mask Prediction module to jointly obtain flare-corrupted masks in RGB and NIR modalities in a self-supervised manner. Second, to use the flare-immunized TI information to enhance the masked RGB and NIR, we propose a Flare-Aware Cross-modal Enhancement module that adaptively guides feature extraction of masked RGB and NIR spectra with prior flare-immunized knowledge from the TI spectrum. Third, to extract common informative semantic information from RGB and NIR, we propose an Inter-modality Consistency loss that enforces semantic consistency between the two modalities. Finally, to evaluate the proposed FACENet in handling intense flare, we introduce a new multi-spectral vehicle re-ID dataset, called WMVEID863, with additional challenges such as motion blur, significant background changes, and particularly intense flare degradation. Comprehensive experiments on both the newly collected dataset and public benchmark multi-spectral vehicle re-ID datasets demonstrate the superior performance of the proposed FACENet compared to state-of-the-art methods, especially in handling strong flares. The code and dataset will be released soon.

Related papers

Spectral-Aware Global Fusion for RGB-Thermal Semantic Segmentation [10.761216101789774]
We propose the Spectral-aware Global Fusion Network (SGFNet) to enhance and fuse the multi-modal features.<n>SGFNet outperforms the state-of-the-art methods on the MFNet and PST900 datasets.
arXiv Detail & Related papers (2025-05-21T13:17:57Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset. We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection. Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z)
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts. Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns. We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z)
NIR-Assisted Image Denoising: A Selective Fusion Approach and A Real-World Benchmark Dataset [53.79524776100983]
Leveraging near-infrared (NIR) images to assist visible RGB image denoising shows the potential to address this issue. Existing works still struggle with taking advantage of NIR information effectively for real-world image denoising. We propose an efficient Selective Fusion Module (SFM), which can be plug-and-played into the advanced denoising networks.
arXiv Detail & Related papers (2024-04-12T14:54:26Z)
Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection [20.12812979315803]
Object detection utilizing both visible (RGB) and thermal infrared (IR) imagery has garnered extensive attention. Most existing multi-modal object detection methods directly input the RGB and IR images into deep neural networks. We propose a novel coarse-to-fine perspective to purify and fuse features from both modalities.
arXiv Detail & Related papers (2024-01-19T14:49:42Z)
Frequency Domain Nuances Mining for Visible-Infrared Person Re-identification [75.87443138635432]
Existing methods mainly exploit the spatial information while ignoring the discriminative frequency information. We propose a novel Frequency Domain Nuances Mining (FDNM) method to explore the cross-modality frequency domain information. Our method outperforms the second-best method by 5.2% in Rank-1 accuracy and 5.8% in mAP on the SYSU-MM01 dataset.
arXiv Detail & Related papers (2024-01-04T09:19:54Z)
Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition [31.783671943393344]
We give the first attempt to deep NIR facial expression recognition and proposed a novel method called near-infrared facial expression transformer (NFER-Former) NFER-Former disentangles the expression information and spectrum information from the input image, so that the expression features can be extracted without the interference of spectrum variation. We have constructed a large NIR-VIS Facial Expression dataset that includes 360 subjects to better validate the efficiency of NFER-Former.
arXiv Detail & Related papers (2023-12-10T15:15:50Z)
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification [26.71900654115498]
We propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN) The proposed DEEN can effectively generate diverse embeddings to learn the informative feature representations. We provide a low-light cross-modality (LLCM) dataset, which contains 46,767 bounding boxes of 1,064 identities captured by 9 RGB/IR cameras.
arXiv Detail & Related papers (2023-03-25T14:24:56Z)
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion. To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z)
SFANet: A Spectrum-aware Feature Augmentation Network for Visible-Infrared Person Re-Identification [12.566284647658053]
We propose a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem. Learning with grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations. In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks.
arXiv Detail & Related papers (2021-02-24T08:57:32Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.