Related papers: EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?

EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?

URL: http://arxiv.org/abs/2511.21523v2
Date: Thu, 04 Dec 2025 15:22:57 GMT
Title: EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?
Authors: Pierre Adorni, Minh-Tan Pham, Stéphane May, Sébastien Lefèvre,
Abstract summary: We present an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs)<n>Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused.<n>Our framework sets a new direction for building scalable and efficient RSFMs.
Score: 8.178030486012437
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advances in foundation models have shown great promise in domains such as natural language processing and computer vision, and similar efforts are now emerging in the Earth Observation community. These models aim to generalize across tasks with limited supervision, reducing the need for training separate models for each task. However, current strategies, which largely focus on scaling model size and dataset volume, require prohibitive computational and data resources, limiting accessibility to only a few large institutions. Moreover, this paradigm of ever-larger models stands in stark contrast with the principles of sustainable and environmentally responsible AI, as it leads to immense carbon footprints and resource inefficiency. In this work, we present a novel and efficient alternative: an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs). Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused. This modular approach offers strong advantages in efficiency, interpretability, and extensibility. Moreover, it naturally supports federated training, pruning, and continuous specialist integration, making it particularly well-suited for collaborative and resource-constrained settings. Our framework sets a new direction for building scalable and efficient RSFMs. All codes and pretrained models are available at https://github.com/pierreadorni/EoS-FM.

Related papers

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models [78.73992315826035]
We introduce Youtu-LLM, a lightweight language model that harmonizes high computational efficiency with native agentic intelligence.<n>Youtu-LLM is pre-trained from scratch to systematically cultivate reasoning and planning capabilities.
arXiv Detail & Related papers (2025-12-31T04:25:11Z)
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation [43.68215777330875]
We introduce a systematic post-training pipeline that efficiently enhances small model accuracy.<n>The resulting instruction-tuned model achieves state-of-the-art performance.<n>This work provides a practical and efficient solution for developing high-performance language models on Ascend edge devices.
arXiv Detail & Related papers (2025-09-30T16:40:55Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
Large-Small Model Collaborative Framework for Federated Continual Learning [20.05022827987955]
Continual learning (CL) for Foundation Models (FMs) is an essential yet underexplored challenge.<n>We propose the first collaborative framework in Federated Continual Learning (FCL), where lightweight local models act as a dynamic bridge.<n>Two novel components are also included: Small Model Continual Fine-tuning is for preventing small models from temporal forgetting; One-by-One Distillation performs personalized fusion of heterogeneous local knowledge on the server.
arXiv Detail & Related papers (2025-08-13T04:49:50Z)
Scaling Laws for Native Multimodal Models [53.490942903659565]
We revisit the architectural design of native multimodal models and conduct an extensive scaling laws study.<n>Our investigation reveals no inherent advantage to late-fusion architectures over early-fusion ones.<n>We show that incorporating Mixture of Experts (MoEs) allows models to learn modality-specific weights, significantly benefiting performance.
arXiv Detail & Related papers (2025-04-10T17:57:28Z)
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling.<n>Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning.<n>We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z)
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [60.596005921295806]
Agglomerative models have emerged as a powerful approach to training vision foundation models.<n>We identify critical challenges including resolution mode shifts, teacher imbalance, idiosyncratic teacher artifacts, and an excessive number of output tokens.<n>We propose several novel solutions: multi-resolution training, mosaic augmentation, and improved balancing of teacher loss functions.
arXiv Detail & Related papers (2024-12-10T17:06:41Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.<n>We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models [31.121714473817793]
Foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. A significant shortcoming of most foundation models lies in their performance in specialized-domain and task-specific applications. We introduce LMFlow, which aims to simplify the domain- and task-aware finetuning of general foundation models.
arXiv Detail & Related papers (2023-06-21T17:58:25Z)
Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z)
eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.