FedRS-Bench: Realistic Federated Learning Datasets and Benchmarks in Remote Sensing
- URL: http://arxiv.org/abs/2505.08325v1
- Date: Tue, 13 May 2025 08:04:03 GMT
- Title: FedRS-Bench: Realistic Federated Learning Datasets and Benchmarks in Remote Sensing
- Authors: Haodong Zhao, Peng Peng, Chiyu Chen, Linqing Huang, Gongshen Liu,
- Abstract summary: Federated learning (FL) offers a solution by enabling collaborative model training across decentralized remote sensing (RS) data sources without exposing raw data.<n>We propose a realistic federated RS dataset, termed FedRS, which is representative of realistic operational scenarios.<n>We implement 10 baseline FL algorithms and evaluation metrics to construct the comprehensive FedRS-Bench.
- Score: 13.10438674331326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote sensing (RS) images are usually produced at an unprecedented scale, yet they are geographically and institutionally distributed, making centralized model training challenging due to data-sharing restrictions and privacy concerns. Federated learning (FL) offers a solution by enabling collaborative model training across decentralized RS data sources without exposing raw data. However, there lacks a realistic federated dataset and benchmark in RS. Prior works typically rely on manually partitioned single dataset, which fail to capture the heterogeneity and scale of real-world RS data, and often use inconsistent experimental setups, hindering fair comparison. To address this gap, we propose a realistic federated RS dataset, termed FedRS. FedRS consists of eight datasets that cover various sensors and resolutions and builds 135 clients, which is representative of realistic operational scenarios. Data for each client come from the same source, exhibiting authentic federated properties such as skewed label distributions, imbalanced client data volumes, and domain heterogeneity across clients. These characteristics reflect practical challenges in federated RS and support evaluation of FL methods at scale. Based on FedRS, we implement 10 baseline FL algorithms and evaluation metrics to construct the comprehensive FedRS-Bench. The experimental results demonstrate that FL can consistently improve model performance over training on isolated data silos, while revealing performance trade-offs of different methods under varying client heterogeneity and availability conditions. We hope FedRS-Bench will accelerate research on large-scale, realistic FL in RS by providing a standardized, rich testbed and facilitating fair comparisons across future works. The source codes and dataset are available at https://fedrs-bench.github.io/.
Related papers
- A Framework for testing Federated Learning algorithms using an edge-like environment [0.0]
Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.<n>It is non-trivial to accurately evaluate the contributions of local models in global centralized model aggregation.<n>This is an example of a major challenge in FL, commonly known as data imbalance or class imbalance.<n>In this work, a framework is proposed and implemented to assess FL algorithms in a more easy and scalable way.
arXiv Detail & Related papers (2024-07-17T19:52:53Z) - FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models [48.484485609995986]
Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM)
There are currently no realistic datasets and benchmarks for FedLLM.
We propose FedLLM-Bench, which involves 8 training methods, 4 training datasets, and 6 evaluation metrics.
arXiv Detail & Related papers (2024-06-07T11:19:30Z) - StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems [22.259297167311964]
Federated learning (FL) is a decentralized learning technique that enables devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party.
Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity.
We propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL.
arXiv Detail & Related papers (2024-05-20T14:41:59Z) - Towards Fairness in Provably Communication-Efficient Federated Recommender Systems [8.215115151660958]
In this study, we establish sample bounds that dictate the ideal number of clients required for improved communication efficiency.
In line with theoretical findings, we empirically demonstrate that RS-FairFRS reduces communication cost.
While random sampling improves communication efficiency, we propose a novel two-phase dual-fair update technique to achieve fairness without revealing protected attributes of active clients participating in training.
arXiv Detail & Related papers (2024-05-03T01:53:17Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm.
It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities.
It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z) - Soft Random Sampling: A Theoretical and Empirical Analysis [59.719035355483875]
Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data.
It selects a uniformly speed at random with replacement from each data set in each epoch.
It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
arXiv Detail & Related papers (2023-11-21T17:03:21Z) - Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training.
In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework.
Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - FedScale: Benchmarking Model and System Performance of Federated
Learning [4.1617240682257925]
FedScale is a set of challenging and realistic benchmark datasets for federated learning (FL) research.
FedScale is open-source with permissive licenses and actively maintained.
arXiv Detail & Related papers (2021-05-24T15:55:27Z) - Federated Visual Classification with Real-World Data Distribution [9.564468846277366]
We characterize the effect real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm.
We introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits.
We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.
arXiv Detail & Related papers (2020-03-18T07:55:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.