Cross-Spectral Body Recognition with Side Information Embedding: Benchmarks on LLCM and Analyzing Range-Induced Occlusions on IJB-MDF
- URL: http://arxiv.org/abs/2506.08953v1
- Date: Tue, 10 Jun 2025 16:20:52 GMT
- Title: Cross-Spectral Body Recognition with Side Information Embedding: Benchmarks on LLCM and Analyzing Range-Induced Occlusions on IJB-MDF
- Authors: Anirudh Nanduri, Siyuan Huang, Rama Chellappa,
- Abstract summary: Vision Transformers (ViTs) have demonstrated impressive performance across a wide range of biometric tasks, including face and body recognition.<n>In this work, we adapt a ViT model pretrained on visible (VIS) imagery to the challenging problem of cross-spectral body recognition.<n>Building on this idea, we integrate Side Information Embedding (SIE) and examine the impact of encoding domain and camera information to enhance cross-spectral matching.<n>Surprisingly, our results show that encoding only camera information - without explicitly incorporating domain information - achieves state-of-the-art performance on the LLCM dataset.
- Score: 51.36007967653781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformers (ViTs) have demonstrated impressive performance across a wide range of biometric tasks, including face and body recognition. In this work, we adapt a ViT model pretrained on visible (VIS) imagery to the challenging problem of cross-spectral body recognition, which involves matching images captured in the visible and infrared (IR) domains. Recent ViT architectures have explored incorporating additional embeddings beyond traditional positional embeddings. Building on this idea, we integrate Side Information Embedding (SIE) and examine the impact of encoding domain and camera information to enhance cross-spectral matching. Surprisingly, our results show that encoding only camera information - without explicitly incorporating domain information - achieves state-of-the-art performance on the LLCM dataset. While occlusion handling has been extensively studied in visible-spectrum person re-identification (Re-ID), occlusions in visible-infrared (VI) Re-ID remain largely underexplored - primarily because existing VI-ReID datasets, such as LLCM, SYSU-MM01, and RegDB, predominantly feature full-body, unoccluded images. To address this gap, we analyze the impact of range-induced occlusions using the IARPA Janus Benchmark Multi-Domain Face (IJB-MDF) dataset, which provides a diverse set of visible and infrared images captured at various distances, enabling cross-range, cross-spectral evaluations.
Related papers
- AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z) - Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption [65.06388526722186]
Infrared-visible image fusion is a critical task in computer vision.<n>There is a lack of recent comprehensive surveys that address this rapidly expanding domain.<n>We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
arXiv Detail & Related papers (2025-01-18T13:17:34Z) - Rethinking the Evaluation of Visible and Infrared Image Fusion [39.53356881392218]
Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks.
This paper proposes a semantic-oriented Evaluation Approach (SEA) to assess VIF methods.
arXiv Detail & Related papers (2024-10-09T12:12:08Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - An Interactively Reinforced Paradigm for Joint Infrared-Visible Image
Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems.
Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent.
multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z) - Diverse Embedding Expansion Network and Low-Light Cross-Modality
Benchmark for Visible-Infrared Person Re-identification [26.71900654115498]
We propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN)
The proposed DEEN can effectively generate diverse embeddings to learn the informative feature representations.
We provide a low-light cross-modality (LLCM) dataset, which contains 46,767 bounding boxes of 1,064 identities captured by 9 RGB/IR cameras.
arXiv Detail & Related papers (2023-03-25T14:24:56Z) - SA-DNet: A on-demand semantic object registration network adapting to
non-rigid deformation [3.3843451892622576]
We propose a Semantic-Aware on-Demand registration network (SA-DNet) to confine the feature matching process to the semantic region of interest.
Our method adapts better to the presence of non-rigid distortions in the images and provides semantically well-registered images.
arXiv Detail & Related papers (2022-10-18T14:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.