Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation
- URL: http://arxiv.org/abs/2509.10919v1
- Date: Sat, 13 Sep 2025 17:35:17 GMT
- Title: Lightweight Metadata-Aware Mixture-of-Experts Masked Autoencoder for Earth Observation
- Authors: Mohanad Albughdadi,
- Abstract summary: We propose a Metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) with only 2.5M parameters.<n>The model combines sparse expert routing with geo-temporal conditioning, incorporating imagery alongside latitude/longitude and seasonal/daily cyclic encodings.<n>Despite its small size, the model competes with much larger architectures, demonstrating that metadata-aware pretraining improves transfer and label efficiency.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Earth Observation have focused on large-scale foundation models. However, these models are computationally expensive, limiting their accessibility and reuse for downstream tasks. In this work, we investigate compact architectures as a practical pathway toward smaller general-purpose EO models. We propose a Metadata-aware Mixture-of-Experts Masked Autoencoder (MoE-MAE) with only 2.5M parameters. The model combines sparse expert routing with geo-temporal conditioning, incorporating imagery alongside latitude/longitude and seasonal/daily cyclic encodings. We pretrain the MoE-MAE on the BigEarthNet-Landsat dataset and evaluate embeddings from its frozen encoder using linear probes. Despite its small size, the model competes with much larger architectures, demonstrating that metadata-aware pretraining improves transfer and label efficiency. To further assess generalization, we evaluate on the EuroSAT-Landsat dataset, which lacks explicit metadata, and still observe competitive performance compared to models with hundreds of millions of parameters. These results suggest that compact, metadata-aware MoE-MAEs are an efficient and scalable step toward future EO foundation models.
Related papers
- RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - Earth Observation Foundation Model PhilEO: Pretraining on the MajorTOM and FastTOM Datasets [22.527733721726587]
We present the scaling-up of our recently proposed EO Foundation Model, PhilEO Geo-Aware U-Net, on the unlabeled 23TB dataset MajorTOM.<n>We fine-tune the models on the PhilEO Bench for road density estimation, building density pixel-wise regression, and land cover semantic segmentation.
arXiv Detail & Related papers (2025-06-17T17:58:08Z) - RemoteSAM: Towards Segment Anything for Earth Observation [29.707796048411705]
We aim to develop a robust yet flexible visual foundation model for Earth observation.<n>It should possess strong capabilities in recognizing and localizing diverse visual targets.<n>We present RemoteSAM, a foundation model that establishes new SoTA on several earth observation perception benchmarks.
arXiv Detail & Related papers (2025-05-23T15:27:57Z) - DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data [61.62554324594797]
We propose DreamMask, which explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data.<n>In general, DreamMask significantly simplifies the collection of large-scale training data, serving as a plug-and-play enhancement for existing methods.<n>For instance, when trained on COCO and tested on ADE20K, the model equipped with DreamMask outperforms the previous state-of-the-art by a substantial margin of 2.1% mIoU.
arXiv Detail & Related papers (2025-01-03T19:00:00Z) - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data [87.61900472933523]
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation.
We scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data.
We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos.
arXiv Detail & Related papers (2024-01-19T18:59:52Z) - Decoupled DETR For Few-shot Object Detection [4.520231308678286]
We improve the FSOD model to address the severe issue of sample imbalance and weak feature propagation.
We build a unified decoder module that could dynamically fuse the decoder layers as the output feature.
Our results indicate that our proposed module could achieve stable improvements of 5% to 10% in both fine-tuning and meta-learning paradigms.
arXiv Detail & Related papers (2023-11-20T07:10:39Z) - ARFA: An Asymmetric Receptive Field Autoencoder Model for Spatiotemporal
Prediction [55.30913411696375]
We propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules.
In the encoder, we present large kernel module for globaltemporal feature extraction. In the decoder, we develop a small kernel module for localtemporal reconstruction.
We construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.
arXiv Detail & Related papers (2023-09-01T07:55:53Z) - Domain Adaptation via Minimax Entropy for Real/Bogus Classification of
Astronomical Alerts [39.58317527488534]
We study Domain Adaptation (DA) for real/bogus classification of astronomical alerts using four different datasets: HiTS, DES, ATLAS, and ZTF.
We study the domain shift between these datasets, and improve a naive deep learning classification model by using a fine tuning approach and semi-supervised deep DA via Minimax Entropy (MME)
We find that both the fine tuning and MME models improve significantly the base model with as few as one labeled item per class coming from the target dataset, but that the MME does not compromise its performance on the source dataset.
arXiv Detail & Related papers (2023-08-15T02:40:32Z) - FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling [20.784555362703294]
We establish a public benchmark dataset for car-following behavior modeling.
The benchmark consists of more than 80K car-following events extracted from five public driving datasets.
Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing.
arXiv Detail & Related papers (2023-05-25T08:59:26Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.