FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in
Realistic Healthcare Settings
- URL: http://arxiv.org/abs/2210.04620v3
- Date: Fri, 5 May 2023 08:48:12 GMT
- Title: FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in
Realistic Healthcare Settings
- Authors: Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix
Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane
Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva,
Maria Tele\'nczuk, Shadi Albarqouni, Salman Avestimehr, Aur\'elien Bellet,
Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi,
Giovanni Neglia, Marc Tommasi, Mathieu Andreux
- Abstract summary: Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models.
We propose a novel cross-silo dataset suite focused on healthcare, FLamby, to bridge the gap between theory and practice of cross-silo FL.
Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research.
- Score: 51.09574369310246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) is a novel approach enabling several clients holding
sensitive data to collaboratively train machine learning models, without
centralizing data. The cross-silo FL setting corresponds to the case of few
($2$--$50$) reliable clients, each holding medium to large datasets, and is
typically found in applications such as healthcare, finance, or industry. While
previous works have proposed representative datasets for cross-device FL, few
realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic
research in this critical application. In this work, we propose a novel
cross-silo dataset suite focused on healthcare, FLamby (Federated Learning
AMple Benchmark of Your cross-silo strategies), to bridge the gap between
theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets
with natural splits, covering multiple tasks, modalities, and data volumes,
each accompanied with baseline training code. As an illustration, we
additionally benchmark standard FL algorithms on all datasets. Our flexible and
modular suite allows researchers to easily download datasets, reproduce results
and re-use the different components for their research. FLamby is available
at~\url{www.github.com/owkin/flamby}.
Related papers
- Stalactite: Toolbox for Fast Prototyping of Vertical Federated Learning Systems [37.11550251825938]
We present emphStalactite - an open-source framework for Vertical Federated Learning (VFL) systems.
VFL is a type of FL where data samples are divided by features across several data owners.
We demonstrate its use on a real-world recommendation datasets.
arXiv Detail & Related papers (2024-09-23T21:29:03Z) - Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification [1.8149633401257899]
Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data.
Active learning (AL) has shown promising performance in reducing the number of data annotations in medical image analysis.
We propose a federated AL (FedAL) framework in which AL is executed periodically and interactively under FL.
arXiv Detail & Related papers (2024-06-17T08:16:28Z) - FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models [48.484485609995986]
Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM)
There are currently no realistic datasets and benchmarks for FedLLM.
We propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics.
arXiv Detail & Related papers (2024-06-07T11:19:30Z) - A Universal Metric of Dataset Similarity for Cross-silo Federated Learning [0.0]
Federated learning is increasingly used in domains such as healthcare to facilitate model training without data-sharing.
In this paper, we propose a novel metric for assessing dataset similarity.
We show that our metric shows a robust and interpretable relationship with model performance and can be calculated in privacy-preserving manner.
arXiv Detail & Related papers (2024-04-29T15:08:24Z) - pfl-research: simulation framework for accelerating research in Private Federated Learning [6.421821657238535]
pfl-research is a fast, modular, and easy-to-use Python framework for simulating Federated learning (FL)
It supports setups, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art algorithms.
We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios.
arXiv Detail & Related papers (2024-04-09T16:23:01Z) - Federated Learning and Meta Learning: Approaches, Applications, and
Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta)
Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z) - Online Data Selection for Federated Learning with Limited Storage [53.46789303416799]
Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices.
The impact of on-device storage on the performance of FL is still not explored.
In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
arXiv Detail & Related papers (2022-09-01T03:27:33Z) - FLAIR: Federated Learning Annotated Image Repository [40.87802770571535]
Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices.
We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning.
We implement multiple baselines in different learning setups for different tasks on this dataset.
arXiv Detail & Related papers (2022-07-18T18:27:04Z) - Multi-Center Federated Learning [62.32725938999433]
Federated learning (FL) can protect data privacy in distributed learning.
It merely collects local gradients from users without access to their data.
We propose a novel multi-center aggregation mechanism.
arXiv Detail & Related papers (2021-08-19T12:20:31Z) - FedML: A Research Library and Benchmark for Federated Machine Learning [55.09054608875831]
Federated learning (FL) is a rapidly growing research field in machine learning.
Existing FL libraries cannot adequately support diverse algorithmic development.
We introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison.
arXiv Detail & Related papers (2020-07-27T13:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.