Related papers: Vision Foundation Models in Remote Sensing: A Survey

Vision Foundation Models in Remote Sensing: A Survey

URL: http://arxiv.org/abs/2408.03464v2
Date: Tue, 11 Feb 2025 22:29:52 GMT
Title: Vision Foundation Models in Remote Sensing: A Survey
Authors: Siqi Lu, Junlin Guo, James R Zimmer-Dauphinee, Jordan M Nieusma, Xiao Wang, Parker VanValkenburgh, Steven A Wernke, Yuankai Huo,
Abstract summary: Foundation models are large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency.<n>This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.
Score: 6.036426846159163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.

Related papers

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
ToolACE-DEV: Self-Improving Tool Learning via Decomposition and EVolution [77.86222359025011]
We propose ToolACE-DEV, a self-improving framework for tool learning.<n>First, we decompose the tool-learning objective into sub-tasks that enhance basic tool-making and tool-using abilities.<n>We then introduce a self-evolving paradigm that allows lightweight models to self-improve, reducing reliance on advanced LLMs.
arXiv Detail & Related papers (2025-05-12T12:48:30Z)
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality [35.532200523631765]
Vision and multimodal foundation models for remote sensing have significantly improved the capabilities of intelligent geospatial data interpretation. The diversity in data types, the need for large-scale annotated datasets, and the complexity of multimodal fusion techniques pose significant obstacles to the effective deployment of these models. This paper provides a review of the state-of-the-art in vision and multimodal foundation models for remote sensing, focusing on their architecture, training methods, datasets and application scenarios.
arXiv Detail & Related papers (2025-03-28T01:57:35Z)
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models [16.16798813072285]
The rapid advancement of artificial intelligence (AI) technologies has led to an increasing deployment of AI models on edge and terminal devices. This survey comprehensively explores the current state, technical challenges, and future trends of on-device AI models.
arXiv Detail & Related papers (2025-03-08T02:59:51Z)
A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation. We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs) We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z)
Low-Rank Adaptation for Foundation Models: A Comprehensive Review [42.23155921954156]
Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges. This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models.
arXiv Detail & Related papers (2024-12-31T09:38:55Z)
Deep Learning and Machine Learning -- Object Detection and Semantic Segmentation: From Theory to Applications [17.571124565519263]
Book covers state-of-the-art advancements in machine learning and deep learning. Focuses on convolutional neural networks (CNNs), YOLO architectures, and transformer-based approaches like DETR. Book also delves into the integration of artificial intelligence (AI) techniques and large language models for enhanced object detection.
arXiv Detail & Related papers (2024-10-21T02:10:49Z)
Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks. We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z)
Advances in Diffusion Models for Image Data Augmentation: A Review of Methods, Models, Evaluation Metrics and Future Research Directions [6.2719115566879236]
Diffusion Models (DMs) have emerged as a powerful tool for image data augmentation. DMs generate realistic and diverse images by learning the underlying data distribution. Current challenges and future research directions in the field are discussed.
arXiv Detail & Related papers (2024-07-04T18:06:48Z)
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications. The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z)
Training and Serving System of Foundation Models: A Comprehensive Survey [32.0115390377174]
This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage.
arXiv Detail & Related papers (2024-01-05T05:27:15Z)
Comprehensive Exploration of Synthetic Data Generation: A Survey [4.485401662312072]
This work surveys 417 Synthetic Data Generation models over the last decade. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
arXiv Detail & Related papers (2024-01-04T20:23:51Z)
A Survey of Serverless Machine Learning Model Inference [0.0]
Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
arXiv Detail & Related papers (2023-11-22T18:46:05Z)
GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks. This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z)
Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
Using satellite imagery to understand and promote sustainable development [87.72561825617062]
We synthesize the growing literature that uses satellite imagery to understand sustainable development outcomes. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and resolution of satellite imagery. We review recent machine learning approaches to model-building in the context of scarce and noisy training data.
arXiv Detail & Related papers (2020-09-23T05:20:00Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.