Do Satellite Tasks Need Special Pretraining?
- URL: http://arxiv.org/abs/2510.17014v1
- Date: Sun, 19 Oct 2025 21:32:01 GMT
- Title: Do Satellite Tasks Need Special Pretraining?
- Authors: Ani Vanyan, Alvard Barseghyan, Hakob Tamazyan, Tigran Galstyan, Vahan Huroyan, Naira Hovakimyan, Hrant Khachatrian,
- Abstract summary: Recently multiple teams trained foundation models specialized for remote sensing applications.<n>We systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models.<n>We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale.
- Score: 7.711325934411266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models have advanced machine learning across various modalities, including images. Recently multiple teams trained foundation models specialized for remote sensing applications. This line of research is motivated by the distinct characteristics of remote sensing imagery, specific applications and types of robustness useful for satellite image analysis. In this work we systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models, at least in the small scale. First, we design a simple benchmark that measures generalization of remote sensing models towards images with lower resolution for two downstream tasks. Second, we train iBOT, a self-supervised vision encoder, on MillionAID, an ImageNet-scale satellite imagery dataset, with several modifications specific to remote sensing. We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale.
Related papers
- Leveraging Large-Scale Pretrained Spatial-Spectral Priors for General Zero-Shot Pansharpening [7.37033839561317]
Existing deep learning methods for remote sensing image fusion often suffer from poor generalization when applied to unseen datasets.<n>We propose a novel pretraining strategy that leverages large-scale simulated datasets to learn robust spatial-spectral priors.<n>The pretrained models achieve superior results in zero-shot scenarios and show remarkable adaptation capability with minimal real data in one-shot settings.
arXiv Detail & Related papers (2025-12-02T10:56:26Z) - Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards [7.14978158285611]
We present the first few-shot reinforcement learning with verifiable reward (RLVR) framework for satellite imagery.<n>We employ policy-gradient optimization with as few as one curated example to align model outputs for satellite reasoning tasks.<n> scaling to 128 examples matches or exceeds models trained on thousands of annotated samples.
arXiv Detail & Related papers (2025-07-29T12:23:19Z) - SatVision-TOA: A Geospatial Foundation Model for Coarse-Resolution All-Sky Remote Sensing Imagery [8.096413986108601]
We introduce SatVision-TOA, a novel foundation model pre-trained on 14-band MODIS L1B Top-Of-Atmosphere (TOA) radiance imagery.
The SatVision-TOA model is pre-trained using a Masked-Image-Modeling (MIM) framework and the SwinV2 architecture.
Results show that SatVision-TOA achieves superior performance over baseline methods on downstream tasks.
arXiv Detail & Related papers (2024-11-26T00:08:00Z) - Improving satellite imagery segmentation using multiple Sentinel-2 revisits [0.0]
We explore the best way to use revisits in the framework of fine-tuning pre-trained remote sensing models.
We find that fusing representations from multiple revisits in the model latent space is superior to other methods of using revisits.
A SWIN Transformer-based architecture performs better than U-nets and ViT-based models.
arXiv Detail & Related papers (2024-09-25T21:13:33Z) - A Causally Informed Pretraining Approach for Multimodal Foundation Models: Applications in Remote Sensing [16.824262496666893]
Self-supervised learning has emerged as a powerful paradigm for pretraining foundation models using large-scale data.<n>We propose Causally Informed Variable-Step Forecasting (CI-VSF), a novel pretraining task that models forecasting as a conditional generation task.<n>We demonstrate that pretraining in such a fashion leads to enhanced performance when finetuned on both prediction and forecasting.
arXiv Detail & Related papers (2024-07-29T02:49:55Z) - RS-Mamba for Large Remote Sensing Image Dense Prediction [58.12667617617306]
We propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images.
RSM is specifically designed to capture the global context of remote sensing images with linear complexity.
Our model achieves better efficiency and accuracy than transformer-based models on large remote sensing images.
arXiv Detail & Related papers (2024-04-03T12:06:01Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - DiffusionSat: A Generative Foundation Model for Satellite Imagery [63.2807119794691]
We present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets.
Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting.
arXiv Detail & Related papers (2023-12-06T16:53:17Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.