RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
- URL: http://arxiv.org/abs/2411.17984v1
- Date: Wed, 27 Nov 2024 01:43:38 GMT
- Title: RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
- Authors: Huiyang Hu, Peijin Wang, Hanbo Bi, Boyuan Tong, Zhaozhi Wang, Wenhui Diao, Hao Chang, Yingchao Feng, Ziqi Zhang, Qixiang Ye, Kun Fu, Xian Sun,
- Abstract summary: We introduce RS-vHeat, an efficient multi-modal remote sensing foundation model.
Specifically, RS-vHeat applies the Heat Conduction Operator (HCO) with a complexity of $O(N1.5)$ and a global receptive field.
Compared to attention-based remote sensing foundation models, we reduces memory consumption by 84%, decreases FLOPs by 24% and improves throughput by 2.7 times.
- Score: 41.70039494644282
- License:
- Abstract: Remote sensing foundation models largely break away from the traditional paradigm of designing task-specific models, offering greater scalability across multiple tasks. However, they face challenges such as low computational efficiency and limited interpretability, especially when dealing with high-resolution remote sensing images. To overcome these, we draw inspiration from heat conduction, a physical process modeling local heat diffusion. Building on this idea, we are the first to explore the potential of using the parallel computing model of heat conduction to simulate the local region correlations in high-resolution remote sensing images, and introduce RS-vHeat, an efficient multi-modal remote sensing foundation model. Specifically, RS-vHeat 1) applies the Heat Conduction Operator (HCO) with a complexity of $O(N^{1.5})$ and a global receptive field, reducing computational overhead while capturing remote sensing object structure information to guide heat diffusion; 2) learns the frequency distribution representations of various scenes through a self-supervised strategy based on frequency domain hierarchical masking and multi-domain reconstruction; 3) significantly improves efficiency and performance over state-of-the-art techniques across 4 tasks and 10 datasets. Compared to attention-based remote sensing foundation models, we reduces memory consumption by 84%, decreases FLOPs by 24% and improves throughput by 2.7 times.
Related papers
- ProbeSDF: Light Field Probes for Neural Surface Reconstruction [4.0130618054041385]
SDF-based differential rendering frameworks have achieved state-of-the-art multiview 3D shape reconstruction.
We re-examine this family of approaches by minimally reformulating its core appearance model.
We show this performance to be consistently achieved on real data over two widely different and popular application fields.
arXiv Detail & Related papers (2024-12-13T12:18:26Z) - vHeat: Building Vision Models upon Heat Conduction [63.00030330898876]
vHeat is a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field.
The essential idea is to conceptualize image patches as heat sources and model the calculation of their correlations as the diffusion of thermal energy.
arXiv Detail & Related papers (2024-05-26T12:58:04Z) - Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement [49.15531684596958]
We propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement.
The first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details.
We have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement.
arXiv Detail & Related papers (2024-04-26T13:21:31Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - LATIS: Lambda Abstraction-based Thermal Image Super-resolution [10.375865762847347]
Single image super-resolution (SISR) is an effective technique to improve the quality of low-resolution thermal images.
The abstraction-based thermal image super-resolution (LATIS) is a novel lightweight architecture for SISR of thermal images.
arXiv Detail & Related papers (2023-11-18T02:55:04Z) - Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions.
We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality.
The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.