Related papers: Scaling Remote Sensing Foundation Models: Data Domain Tradeoffs at the Peta-Scale

Scaling Remote Sensing Foundation Models: Data Domain Tradeoffs at the Peta-Scale

URL: http://arxiv.org/abs/2512.23903v1
Date: Mon, 29 Dec 2025 23:53:11 GMT
Title: Scaling Remote Sensing Foundation Models: Data Domain Tradeoffs at the Peta-Scale
Authors: Charith Wickrema, Eliza Mace, Hunter Brown, Heidys Cabrera, Nick Krall, Matthew O'Neill, Shivangi Sarkar, Lowell Weissman, Eric Hughes, Guido Zarrella,
Abstract summary: We explore the scaling behaviors of artificial intelligence to establish techniques for training foundation models on high-resolution EO datasets.<n>We observe that even at this scale, performance is consistent with a data limited regime.<n>These practical insights are intended to inform data-collection strategies, compute budgets, and optimization schedules.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore the scaling behaviors of artificial intelligence to establish practical techniques for training foundation models on high-resolution electro-optical (EO) datasets that exceed the current state-of-the-art scale by orders of magnitude. Modern multimodal machine learning (ML) applications, such as generative artificial intelligence (GenAI) systems for image captioning, search, and reasoning, depend on robust, domain-specialized encoders for non-text modalities. In natural-image domains where internet-scale data is plentiful, well-established scaling laws help optimize the joint scaling of model capacity, training compute, and dataset size. Unfortunately, these relationships are much less well-understood in high-value domains like remote sensing (RS). Using over a quadrillion pixels of commercial satellite EO data and the MITRE Federal AI Sandbox, we train progressively larger vision transformer (ViT) backbones, report success and failure modes observed at petascale, and analyze implications for bridging domain gaps across additional RS modalities. We observe that even at this scale, performance is consistent with a data limited regime rather than a model parameter-limited one. These practical insights are intended to inform data-collection strategies, compute budgets, and optimization schedules that advance the future development of frontier-scale RS foundation models.

Related papers

Future Optical Flow Prediction Improves Robot Control & Video Generation [100.87884718953099]
We introduce FOFPred, a novel optical flow forecasting model featuring a unified Vision-Language Model (VLM) and Diffusion architecture.<n>Our model is trained on web-scale human activity data-a highly scalable but unstructured source.<n> Evaluations across robotic manipulation and video generation under language-driven settings establish the cross-domain versatility of FOFPred.
arXiv Detail & Related papers (2026-01-15T18:49:48Z)
Co-Training Vision Language Models for Remote Sensing Multi-task Learning [68.15604397741753]
Vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning.<n>We present RSCoVLM, a simple yet flexible VLM baseline for RS MTL.<n>We propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery.
arXiv Detail & Related papers (2025-11-26T10:55:07Z)
ScaleDL: Towards Scalable and Efficient Runtime Prediction for Distributed Deep Learning Workloads [14.876533021201539]
ScaleDL is a novel runtime prediction framework for deep neural networks (DNNs)<n>It combines nonlinear layer-wise modeling with graph neural network (GNN)-based cross-layer interaction mechanism.<n>Experiments show that ScaleDL enhances runtime prediction accuracy and generalizability, achieving 6 times lower MRE and 5 times lower RMSE compared to baseline models.
arXiv Detail & Related papers (2025-11-06T08:05:55Z)
On the Status of Foundation Models for SAR Imagery [10.480790915352255]
We investigate the viability of foundational AI/ML models for Synthetic Aperture Radar (SAR) object recognition tasks.<n>We show that Self-Supervised finetuning of publicly available SSL models with SAR data is a viable path forward.<n>Our experiments further analyze the performance trade-off of using different backbones with different downstream task-adaptation recipes.
arXiv Detail & Related papers (2025-09-26T00:46:17Z)
Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies [66.83950068218033]
Scaling Laws demonstrate that scaling model parameters and training data enhances learning performance.<n>Despite its potential to improve performance, the integration of scaling laws into deep reinforcement learning has not been fully realized.<n>This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget.
arXiv Detail & Related papers (2025-08-05T08:03:12Z)
Enhancing material behavior discovery using embedding-oriented Physically-Guided Neural Networks with Internal Variables [0.0]
Physically Guided Neural Networks with Internal Variables are SciML tools that use only observable data for training and unravel internal state relations.<n>Despite their potential, these models face challenges in scalability when applied to high-dimensional data such as fine-grid spatial fields or time-evolving systems.<n>We propose some enhancements to the PGNNIV framework that address these scalability limitations through reduced-order modeling techniques.
arXiv Detail & Related papers (2025-08-01T12:33:21Z)
Efficient Adaptation For Remote Sensing Visual Grounding [0.46425518005471045]
Adapting pre-trained models has become an effective strategy in artificial intelligence, offering a scalable and efficient alternative to training models from scratch.<n>This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in remote sensing.
arXiv Detail & Related papers (2025-03-29T13:49:11Z)
Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z)
OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery [0.3926357402982764]
Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities. We pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs.
arXiv Detail & Related papers (2024-10-25T20:55:12Z)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data. The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data. Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z)
Interpretable AI-based Large-scale 3D Pathloss Prediction Model for enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss. By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.