Related papers: Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing

Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing

URL: http://arxiv.org/abs/2405.17901v1
Date: Tue, 28 May 2024 07:24:07 GMT
Title: Near-Infrared and Low-Rank Adaptation of Vision Transformers in Remote Sensing
Authors: Irem Ulku, O. Ozgur Tanriover, Erdem Akagündüz,
Abstract summary: Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR) Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. This study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain.
Score: 3.2088888904556123
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Plant health can be monitored dynamically using multispectral sensors that measure Near-Infrared reflectance (NIR). Despite this potential, obtaining and annotating high-resolution NIR images poses a significant challenge for training deep neural networks. Typically, large networks pre-trained on the RGB domain are utilized to fine-tune infrared images. This practice introduces a domain shift issue because of the differing visual traits between RGB and NIR images.As an alternative to fine-tuning, a method called low-rank adaptation (LoRA) enables more efficient training by optimizing rank-decomposition matrices while keeping the original network weights frozen. However, existing parameter-efficient adaptation strategies for remote sensing images focus on RGB images and overlook domain shift issues in the NIR domain. Therefore, this study investigates the potential benefits of using vision transformer (ViT) backbones pre-trained in the RGB domain, with low-rank adaptation for downstream tasks in the NIR domain. Extensive experiments demonstrate that employing LoRA with pre-trained ViT backbones yields the best performance for downstream tasks applied to NIR images.

Related papers

SAGA: Semantic-Aware Gray color Augmentation for Visible-to-Thermal Domain Adaptation across Multi-View Drone and Ground-Based Vision Systems [1.891522135443594]
Domain-adaptive thermal object detection plays a key role in facilitating visible (RGB)-to-thermal (IR) adaptation. In inherent limitations of IR images, such as the lack of color and texture cues, pose challenges for RGB-trained models. We propose Semantic-Aware Gray color Augmentation (SAGA), a novel strategy for mitigating color bias and bridging the domain gap.
arXiv Detail & Related papers (2025-04-22T09:22:11Z)
Multi-Domain Biometric Recognition using Body Embeddings [51.36007967653781]
We show that body embeddings perform better than face embeddings in medium-wave infrared (MWIR) and long-wave infrared (LWIR) domains. We leverage a vision transformer architecture to establish benchmark results on the IJB-MDF dataset. We also show that finetuning a body model, pretrained exclusively on VIS data, with a simple combination of cross-entropy and triplet losses achieves state-of-the-art mAP scores.
arXiv Detail & Related papers (2025-03-13T22:38:18Z)
Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection. Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z)
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts. Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns. We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z)
Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement [71.13353154514418]
Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. We present a novel Mamba scanning mechanism, called RAWMamba, to effectively handle raw images with different CFAs. We also present a Retinex Decomposition Module (RDM) grounded in Retinex prior, which decouples illumination from reflectance to facilitate more effective denoising and automatic non-linear exposure correction.
arXiv Detail & Related papers (2024-09-11T06:12:03Z)
Towards RGB-NIR Cross-modality Image Registration and Beyond [21.475871648254564]
This paper focuses on the area of RGB(visible)-NIR(near-infrared) cross-modality image registration. We first present the RGB-NIR Image Registration (RGB-NIR-IRegis) benchmark, which, for the first time, enables fair and comprehensive evaluations. We then design several metrics to reveal the toxic impact of inconsistent local features between visible and infrared images on the model performance.
arXiv Detail & Related papers (2024-05-30T10:25:50Z)
Multi-scale Progressive Feature Embedding for Accurate NIR-to-RGB Spectral Domain Translation [6.580484964018551]
We introduce a domain translation module that translates NIR source images into the grayscale target domain. By incorporating a progressive training strategy, the statistical and semantic knowledge from both task domains are efficiently aligned. Experiments show that our MPFNet outperforms state-of-the-art counterparts by 2.55 dB in the NIR-to-RGB spectral domain translation task.
arXiv Detail & Related papers (2023-12-26T13:07:45Z)
Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection [22.60228799622782]
Key bottleneck in object detection in IR images is lack of sufficient labeled training data. We seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting.
arXiv Detail & Related papers (2023-09-28T16:55:52Z)
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement. In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z)
Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer. We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range. We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z)
Infrared Image Super-Resolution via Heterogeneous Convolutional WGAN [4.6667021835430145]
We present a framework that employs heterogeneous kernel-based super-resolution Wasserstein GAN (HetSRWGAN) for IR image super-resolution. HetSRWGAN achieves consistently better performance in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2021-09-02T14:01:05Z)
TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation Localization [49.521622399483846]
We propose a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) for generic image manipulation localization. The proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
arXiv Detail & Related papers (2021-08-10T08:22:05Z)
Generation of the NIR spectral Band for Satellite Images with Convolutional Neural Networks [0.0]
Deep neural networks allow generating artificial spectral information, such as for the image colorization problem. We study the generative adversarial network (GAN) approach in the task of the NIR band generation using just RGB channels of high-resolution satellite imagery.
arXiv Detail & Related papers (2021-06-13T15:14:57Z)
MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD) We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD. With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.