Related papers: FlexOlmo: Open Language Models for Flexible Data Use

FlexOlmo: Open Language Models for Flexible Data Use

URL: http://arxiv.org/abs/2507.07024v3
Date: Sat, 02 Aug 2025 21:10:14 GMT
Title: FlexOlmo: Open Language Models for Flexible Data Use
Authors: Weijia Shi, Akshita Bhagia, Kevin Farhat, Niklas Muennighoff, Pete Walsh, Jacob Morrison, Dustin Schwenk, Shayne Longpre, Jake Poznanski, Allyson Ettinger, Daogao Liu, Margaret Li, Dirk Groeneveld, Mike Lewis, Wen-tau Yih, Luca Soldaini, Kyle Lo, Noah A. Smith, Luke Zettlemoyer, Pang Wei Koh, Hannaneh Hajishirzi, Ali Farhadi, Sewon Min,
Abstract summary: We introduce FlexOlmo, a new class of language models (LMs) that supports distributed training without data sharing.<n> FlexOlmo employs a mixture-of-experts architecture where each expert is trained independently on closed datasets.<n>We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners.
Score: 184.87790266932316
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce FlexOlmo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on closed datasets, and (2) data-flexible inference, where these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training. FlexOlmo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on closed datasets and later integrated through a new domain-informed routing without any joint training. FlexOlmo is trained on FlexMix, a corpus we curate comprising publicly available datasets alongside seven domain-specific sets, representing realistic approximations of closed sets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners, leading to an average 41% relative improvement while allowing users to opt out of certain data based on data licensing or permission requirements. Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, this research presents a solution for both data owners and researchers in regulated industries with sensitive or protected data. FlexOlmo enables benefiting from closed data while respecting data owners' preferences by keeping their data local and supporting fine-grained control of data access during inference.

Related papers

Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients [52.14230635007546]
Foundation models have shown remarkable capabilities across diverse multi-modal tasks, but their centralized training raises privacy concerns and induces high transmission costs.<n>For the growing demand for personalizing AI models for different user purposes, personalized federated learning (PFL) has emerged.<n>PFL allows each client to leverage the knowledge of other clients for further adaptation to individual user preferences, again without the need to share data.
arXiv Detail & Related papers (2025-05-20T09:17:07Z)
Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures [15.645254436094055]
Federated Learning (FL) enables collaborative fine-tuning of Large Language Models without accessing raw data.<n>We propose FedAMoLE, a lightweight personalized FL framework that enables data-driven heterogeneous model architectures.<n> Experiments show that FedAMoLE improves client-side performance by an average of 5.14% compared to existing approaches.
arXiv Detail & Related papers (2024-11-28T13:20:38Z)
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models [48.484485609995986]
Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM) There are currently no realistic datasets and benchmarks for FedLLM. We propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics.
arXiv Detail & Related papers (2024-06-07T11:19:30Z)
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning [4.02923738318937]
Uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning. This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew. We propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor.
arXiv Detail & Related papers (2022-08-04T04:24:16Z)
Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients [98.22390453672499]
Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients.
arXiv Detail & Related papers (2022-04-07T09:12:00Z)
Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models. We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z)
Unifying Distillation with Personalization in Federated Learning [1.8262547855491458]
Federated learning (FL) is a decentralized privacy-preserving learning technique in which clients learn a joint collaborative model through a central aggregator without sharing their data. In this setting, all clients learn a single common predictor (FedAvg), which does not generalize well on each client's local data due to the statistical data heterogeneity among clients. In this paper, we address this problem with PersFL, a two-stage personalized learning algorithm. In the first stage, PersFL finds the optimal teacher model of each client during the FL training phase. In the second stage, PersFL distills the useful knowledge from
arXiv Detail & Related papers (2021-05-31T17:54:29Z)
SCEI: A Smart-Contract Driven Edge Intelligence Framework for IoT Systems [15.796325306292134]
Federated learning (FL) enables collaborative training of a shared model on edge devices while maintaining data privacy. Various personalized approaches have been proposed, but such approaches fail to handle underlying shifts in data distribution. This paper presents a dynamically optimized personal deep learning scheme based on blockchain and federated learning.
arXiv Detail & Related papers (2021-03-12T02:57:05Z)
FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z)
FedSmart: An Auto Updating Federated Learning Optimization Mechanism [23.842595615337565]
Federated learning has made an important contribution to data privacy-preserving. Some existing methods of ensuring the model robustness on non-IID data, like the data-sharing strategy or pretraining, may lead to privacy leaking. In this paper, a performance-based parameter return method for optimization is introduced, we term it FederatedSmart.
arXiv Detail & Related papers (2020-09-16T03:59:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.