RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
- URL: http://arxiv.org/abs/2411.17984v2
- Date: Fri, 07 Mar 2025 13:24:25 GMT
- Title: RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
- Authors: Huiyang Hu, Peijin Wang, Hanbo Bi, Boyuan Tong, Zhaozhi Wang, Wenhui Diao, Hao Chang, Yingchao Feng, Ziqi Zhang, Yaowei Wang, Qixiang Ye, Kun Fu, Xian Sun,
- Abstract summary: We introduce RS-vHeat, an efficient multi-modal remote sensing foundation model.<n>Specifically, RS-vHeat applies the Heat Conduction Operator (HCO) with a complexity of $O(N1.5)$ and a global receptive field.<n>Compared to attention-based remote sensing foundation models, we reduce memory usage by 84%, FLOPs by 24% and improves throughput by 2.7 times.
- Score: 59.37279559684668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote sensing foundation models largely break away from the traditional paradigm of designing task-specific models, offering greater scalability across multiple tasks. However, they face challenges such as low computational efficiency and limited interpretability, especially when dealing with large-scale remote sensing images. To overcome these, we draw inspiration from heat conduction, a physical process modeling local heat diffusion. Building on this idea, we are the first to explore the potential of using the parallel computing model of heat conduction to simulate the local region correlations in high-resolution remote sensing images, and introduce RS-vHeat, an efficient multi-modal remote sensing foundation model. Specifically, RS-vHeat 1) applies the Heat Conduction Operator (HCO) with a complexity of $O(N^{1.5})$ and a global receptive field, reducing computational overhead while capturing remote sensing object structure information to guide heat diffusion; 2) learns the frequency distribution representations of various scenes through a self-supervised strategy based on frequency domain hierarchical masking and multi-domain reconstruction; 3) significantly improves efficiency and performance over state-of-the-art techniques across 4 tasks and 10 datasets. Compared to attention-based remote sensing foundation models, we reduce memory usage by 84\%, FLOPs by 24\% and improves throughput by 2.7 times. The code will be made publicly available.
Related papers
- Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant.
Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations.
To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z) - LDGNet: A Lightweight Difference Guiding Network for Remote Sensing Change Detection [6.554696547472252]
We propose a Lightweight Difference Guiding Network (LDGNet) to guide optical remote sensing change detection.
First, to enhance the feature representation capability of the lightweight backbone network, we propose the Difference Guiding Module (DGM)
Second, we propose the Difference-Aware Dynamic Fusion (DADF) module with Visual State Space Model (VSSM) for lightweight long-range dependency modeling.
arXiv Detail & Related papers (2025-04-07T13:33:54Z) - DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging [47.22313650077835]
We introduce thermal degradation simulation integrated into the training process via a mini-max optimization.
The simulation is dynamic to maximize objective functions, thus capturing a broad spectrum of degraded data distributions.
This approach enables training with limited data, thereby improving model performance.
arXiv Detail & Related papers (2025-03-02T14:15:44Z) - BAFNet: Bilateral Attention Fusion Network for Lightweight Semantic Segmentation of Urban Remote Sensing Images [6.153725909241752]
We propose a lightweight semantic segmentation network called bilateral attention fusion network (BAFNet) to efficiently segment high-resolution urban remote sensing images.
BAFNet outperforms advanced lightweight models in accuracy but also demonstrates comparable performance to non-lightweight state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2024-09-16T13:25:42Z) - vHeat: Building Vision Models upon Heat Conduction [63.00030330898876]
vHeat is a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field.
The essential idea is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy.
arXiv Detail & Related papers (2024-05-26T12:58:04Z) - Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement [49.15531684596958]
We propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement.
The first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details.
We have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement.
arXiv Detail & Related papers (2024-04-26T13:21:31Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - LATIS: Lambda Abstraction-based Thermal Image Super-resolution [10.375865762847347]
Single image super-resolution (SISR) is an effective technique to improve the quality of low-resolution thermal images.
The abstraction-based thermal image super-resolution (LATIS) is a novel lightweight architecture for SISR of thermal images.
arXiv Detail & Related papers (2023-11-18T02:55:04Z) - Inference from Real-World Sparse Measurements [21.194357028394226]
Real-world problems often involve complex and unstructured sets of measurements, which occur when sensors are sparsely placed in either space or time.
Deep learning architectures capable of processing sets of measurements with positions varying from set to set and extracting readouts anywhere are methodologically difficult.
We propose an attention-based model focused on applicability and practical robustness, with two key design contributions.
arXiv Detail & Related papers (2022-10-20T13:42:20Z) - Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions.
We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality.
The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.