SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems
- URL: http://arxiv.org/abs/2504.15728v1
- Date: Tue, 22 Apr 2025 09:22:11 GMT
- Title: SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems
- Authors: Manjunath D, Aniruddh Sikdar, Prajwal Gurunath, Sumanth Udupa, Suresh Sundaram,
- Abstract summary: Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation.<n>In inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models.<n>We propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap.
- Score: 1.891522135443594
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation by reducing the need for co-registered image pairs and minimizing reliance on large annotated IR datasets. However, inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models, leading to increased false positives and poor-quality pseudo-labels. To address this, we propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap by extracting object-level features relevant to IR images. Additionally, to validate the proposed SAGA for drone imagery, we introduce the IndraEye, a multi-sensor (RGB-IR) dataset designed for diverse applications. The dataset contains 5,612 images with 145,666 instances, captured from diverse angles, altitudes, backgrounds, and times of day, offering valuable opportunities for multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses. IndraEye aims to enhance the development of more robust and accurate aerial perception systems, especially in challenging environments. Experimental results show that SAGA significantly improves RGB-to-IR adaptation for autonomous driving and IndraEye dataset, achieving consistent performance gains of +0.4% to +7.6% (mAP) when integrated with state-of-the-art domain adaptation techniques. The dataset and codes are available at https://github.com/airliisc/IndraEye.
Related papers
- RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions [0.3141085922386211]
Short-wave infrared (SWIR) imaging offers several advantages over NIR and LWIR.<n>Current autonomous driving algorithms heavily rely on the visible spectrum, which is prone to performance degradation in adverse conditions.<n>We introduce the RGB and SWIR Multispectral Driving dataset, which comprises 100,000 synchronized and spatially aligned RGB-SWIR image pairs.
arXiv Detail & Related papers (2025-04-10T09:54:57Z) - Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains.<n>We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset.<n>We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z) - Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.<n>Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - IndraEye: Infrared Electro-Optical UAV-based Perception Dataset for Robust Downstream Tasks [1.629670808239867]
In this paper, we introduce the IndraEye dataset, a multi-sensor (EO-IR) dataset designed for various tasks.
It includes 5,612 images with 145,666 instances, encompassing multiple viewing angles, altitudes, seven backgrounds, and different times of the day across the Indian subcontinent.
The dataset opens up several research opportunities, such as multimodal learning, domain adaptation for object detection and segmentation, and exploration of sensor-specific strengths and weaknesses.
arXiv Detail & Related papers (2024-10-28T12:12:28Z) - Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation [0.536022165180739]
We propose a novel image-to-image translation framework, Pix2Next, to generate high-quality Near-Infrared (NIR) images from RGB inputs.
A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation.
The proposed approach enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications.
arXiv Detail & Related papers (2024-09-25T07:51:47Z) - Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing [3.2088888904556123]
Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR)
Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks.
This study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain.
arXiv Detail & Related papers (2024-05-28T07:24:07Z) - RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications [55.24463002889]
We focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim)
In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors.
RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.
arXiv Detail & Related papers (2024-04-05T08:52:32Z) - ShadowSense: Unsupervised Domain Adaptation and Feature Fusion for
Shadow-Agnostic Tree Crown Detection from RGB-Thermal Drone Imagery [7.2038295985918825]
This paper presents a novel method for detecting shadowed tree crowns from remote sensing data.
The proposed method (ShadowSense) is entirely self-supervised, leveraging domain adversarial training without source domain annotations.
It then fuses complementary information of both modalities to effectively improve upon the predictions of an RGB-trained detector.
arXiv Detail & Related papers (2023-10-24T22:01:14Z) - Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using
Meta-Learning [64.92447072894055]
Infrared (IR) cameras are robust under adverse illumination and lighting conditions.
We propose an algorithm meta-learning framework to improve existing UDA methods.
We produce a state-of-the-art thermal detector for the KAIST and DSIAC datasets.
arXiv Detail & Related papers (2021-10-07T02:28:18Z) - MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD)
We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD.
With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.