Related papers: StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

URL: http://arxiv.org/abs/2405.13062v1
Date: Mon, 20 May 2024 14:41:59 GMT
Title: StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems
Authors: Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis,
Abstract summary: Federated learning (FL) is a decentralized learning technique that enables devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. We propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL.
Score: 22.259297167311964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.

Related papers

Disentangling data distribution for Federated Learning [20.524108508314107]
Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle achieve efficiencies comparable to those of distributed systems.
arXiv Detail & Related papers (2024-10-16T13:10:04Z)
FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm. It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities. It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z)
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning [12.636154758643757]
Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. We propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL.
arXiv Detail & Related papers (2023-06-14T05:46:52Z)
PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation. This work proposes a novel FL framework that requires only partial GAN model sharing. Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Semi-Supervised Federated Learning with non-IID Data: Algorithm and System Design [42.63120623012093]
Federated Learning (FL) allows edge devices (or clients) to keep data locally while simultaneously training a shared global model. The distribution of the client's local training data is non-independent identically distributed (non-IID) We present a robust semi-supervised FL system design, where the system aims to solve the problem of data availability and non-IID in FL.
arXiv Detail & Related papers (2021-10-26T03:41:48Z)
Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection [16.975086164684882]
Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data. We mathematically demonstrate the cause of performance degradation in FL and examine the performance of FL over various datasets. We propose a pluggable system-level client selection method named Dubhe, which allows clients to proactively participate in training, preserving their privacy with the assistance of HE.
arXiv Detail & Related papers (2021-09-08T13:00:46Z)
Federated Robustness Propagation: Sharing Adversarial Robustness in Federated Learning [98.05061014090913]
Federated learning (FL) emerges as a popular distributed learning schema that learns from a set of participating users without requiring raw data to be shared. adversarial training (AT) provides a sound solution for centralized learning, extending its usage for FL users has imposed significant challenges. We show that existing FL techniques cannot effectively propagate adversarial robustness among non-iid users. We propose a simple yet effective propagation approach that transfers robustness through carefully designed batch-normalization statistics.
arXiv Detail & Related papers (2021-06-18T15:52:33Z)
A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion. This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z)
WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.