A Genealogy of Foundation Models in Remote Sensing
- URL: http://arxiv.org/abs/2504.17177v2
- Date: Fri, 31 Oct 2025 17:22:45 GMT
- Title: A Genealogy of Foundation Models in Remote Sensing
- Authors: Kevin Lane, Morteza Karimzadeh,
- Abstract summary: Foundation models have garnered increasing attention for representation learning in remote sensing.<n>This paper examines these approaches, along with their roots in the computer vision field.<n>We discuss the quality of the learned representations and methods to alleviate the need for massive compute resources.
- Score: 0.4468952886990849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models have garnered increasing attention for representation learning in remote sensing. Many such foundation models adopt approaches that have demonstrated success in computer vision with minimal domain-specific modification. However, the development and application of foundation models in this field are still burgeoning, as there are a variety of competing approaches for how to most effectively leverage remotely sensed data. This paper examines these approaches, along with their roots in the computer vision field. This is done to characterize potential advantages and pitfalls, while outlining future directions to further improve remote sensing-specific foundation models. We discuss the quality of the learned representations and methods to alleviate the need for massive compute resources. We first examine single-sensor remote foundation models to introduce concepts and provide context, and then place emphasis on incorporating the multi-sensor aspect of Earth observations into foundation models. In particular, we explore the extent to which existing approaches leverage multiple sensors in training foundation models in relation to multi-modal foundation models. Finally, we identify opportunities for further harnessing the vast amounts of unlabeled, seasonal, and multi-sensor remote sensing observations.
Related papers
- Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges [53.47232506143113]
Multi-modal foundation models have transformed the technology for autonomous driving.<n>We provide a comprehensive examination of such methods through a unifying taxonomy.<n>We assess these approaches with respect to the openness of their source code and datasets.
arXiv Detail & Related papers (2025-10-31T18:05:02Z) - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - Towards Scalable and Generalizable Earth Observation Data Mining via Foundation Model Composition [0.0]
We investigate whether foundation models pretrained on remote sensing and general vision datasets can be effectively combined to improve performance.<n>The results show that feature-level ensembling of smaller pretrained models can match or exceed the performance of much larger models.<n>The study highlights the potential of applying knowledge distillation to transfer the strengths of ensembles into more compact models.
arXiv Detail & Related papers (2025-06-25T07:02:42Z) - Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing [62.447497430479174]
Drawing to reason in space is a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space.<n>Our model, named VILASR, consistently outperforms existing methods across diverse spatial reasoning benchmarks.
arXiv Detail & Related papers (2025-06-11T17:41:50Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning [5.299218284699214]
High-performance segmentation models are challenged by annotation scarcity and variability across sensors, illumination, and geography.<n>This paper introduces a domain generalization approach to leveraging emerging geospatial foundation models by combining soft-alignment pseudo-labeling with source-to-target generative pre-training.<n> Experiments with hyperspectral and multispectral remote sensing datasets confirm our method's effectiveness in enhancing adaptability and segmentation.
arXiv Detail & Related papers (2025-05-02T19:52:02Z) - A Survey on Remote Sensing Foundation Models: From Vision to Multimodality [35.532200523631765]
Vision and multimodal foundation models for remote sensing have significantly improved the capabilities of intelligent geospatial data interpretation.<n>The diversity in data types, the need for large-scale annotated datasets, and the complexity of multimodal fusion techniques pose significant obstacles to the effective deployment of these models.<n>This paper provides a review of the state-of-the-art in vision and multimodal foundation models for remote sensing, focusing on their architecture, training methods, datasets and application scenarios.
arXiv Detail & Related papers (2025-03-28T01:57:35Z) - Embracing Diversity: A Multi-Perspective Approach with Soft Labels [3.529000007777341]
We propose a new framework for designing perspective-aware models on stance detection task, in which multiple annotators assign stances based on a controversial topic.
Results show that the multi-perspective approach yields better classification performance (higher F1-scores)
arXiv Detail & Related papers (2025-03-01T13:33:38Z) - A Survey of Model Architectures in Information Retrieval [59.61734783818073]
The period from 2019 to the present has represented one of the biggest paradigm shifts in information retrieval (IR) and natural language processing (NLP)<n>We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)<n>We conclude with a forward-looking discussion of emerging challenges and future directions.
arXiv Detail & Related papers (2025-02-20T18:42:58Z) - Low-Rank Adaptation for Foundation Models: A Comprehensive Review [56.341827242332194]
Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges.<n>This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models.
arXiv Detail & Related papers (2024-12-31T09:38:55Z) - Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs)
It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts.
We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z) - Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey [2.9373912230684565]
Change detection aims to analyze changes in surface areas over time and has broad applications in areas such as environmental monitoring, urban development, and land use analysis.
Deep learning, especially the development of foundation models, has provided more powerful solutions for feature extraction and data fusion.
This paper systematically reviews the latest advancements in the field of change detection, with a focus on the application of foundation models in remote sensing tasks.
arXiv Detail & Related papers (2024-10-10T11:16:05Z) - Improving satellite imagery segmentation using multiple Sentinel-2 revisits [0.0]
We explore the best way to use revisits in the framework of fine-tuning pre-trained remote sensing models.
We find that fusing representations from multiple revisits in the model latent space is superior to other methods of using revisits.
A SWIN Transformer-based architecture performs better than U-nets and ViT-based models.
arXiv Detail & Related papers (2024-09-25T21:13:33Z) - Vision Foundation Models in Remote Sensing: A Survey [6.036426846159163]
Foundation models are large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency.<n>This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.
arXiv Detail & Related papers (2024-08-06T22:39:34Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - A Survey for Foundation Models in Autonomous Driving [10.315409708116865]
Large language models contribute to planning and simulation in autonomous driving.
vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking.
Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning.
arXiv Detail & Related papers (2024-02-02T02:44:59Z) - Graph Foundation Models: Concepts, Opportunities and Challenges [66.37994863159861]
Foundation models have emerged as critical components in a variety of artificial intelligence applications.<n>The capabilities of foundation models in generalization and adaptation motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm.<n>This article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies.
arXiv Detail & Related papers (2023-10-18T09:31:21Z) - Toward Foundation Models for Earth Monitoring: Proposal for a Climate
Change Benchmark [95.19070157520633]
Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to impressive increases in generalisation for downstream tasks.
Such models, recently coined as foundation models, have been transformational to the field of natural language processing.
We propose to develop a new benchmark comprised of a variety of downstream tasks related to climate change.
arXiv Detail & Related papers (2021-12-01T15:38:19Z) - A Survey of Community Detection Approaches: From Statistical Modeling to
Deep Learning [95.27249880156256]
We develop and present a unified architecture of network community-finding methods.
We introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning.
We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.
arXiv Detail & Related papers (2021-01-03T02:32:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.