Related papers: Evaluation of Federated Learning in Phishing Email Detection

Evaluation of Federated Learning in Phishing Email Detection

URL: http://arxiv.org/abs/2007.13300v3
Date: Fri, 21 May 2021 06:17:50 GMT
Title: Evaluation of Federated Learning in Phishing Email Detection
Authors: Chandra Thapa, Jun Wen Tang, Alsharif Abuadbba, Yansong Gao, Seyit Camtepe, Surya Nepal, Mahathir Almashor, Yifeng Zheng
Abstract summary: This paper builds upon a deep neural network model, particularly RNN and BERT for phishing email detection. It analyzes the FL-entangled learning performance under various settings, including balanced and asymmetrical data distribution.
Score: 24.85352882358906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The use of Artificial Intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which opens it up to a myriad of privacy, trust, and legal issues. Moreover, organizations are loathed to share emails, given the risk of leakage of commercially sensitive information. So, it is uncommon to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly Federated Learning (FL), is a desideratum. Already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein is the first to investigate the use of FL in email anti-phishing. This paper builds upon a deep neural network model, particularly RNN and BERT for phishing email detection. It analyzes the FL-entangled learning performance under various settings, including balanced and asymmetrical data distribution. Our results corroborate comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets, and low organization counts. Moreover, we observe a variation in performance when increasing organizational counts. For a fixed total email dataset, the global RNN based model suffers by a 1.8% accuracy drop when increasing organizational counts from 2 to 10. In contrast, BERT accuracy rises by 0.6% when going from 2 to 5 organizations. However, if we allow increasing the overall email dataset with the introduction of new organizations in the FL framework, the organizational level performance is improved by achieving a faster convergence speed. Besides, FL suffers in its overall global model performance due to highly unstable outputs if the email dataset distribution is highly asymmetric.

Related papers

MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection [0.0]
This paper presents MeAJOR, a novel, multi-source phishing email dataset.<n>It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails.<n>By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource.
arXiv Detail & Related papers (2025-07-23T22:57:08Z)
Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models [43.26028399395612]
We propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods. First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process. Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA.
arXiv Detail & Related papers (2024-09-30T18:12:18Z)
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing. While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z)
A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation [5.011091042850546]
Adapting foundation models for medical image analysis requires finetuning them on a considerable amount of data. collecting task-specific medical data for such finetuning at a central location raises many privacy concerns. Although Federated learning (FL) provides an effective means for training on private decentralized data, communication costs in federating large foundation models can quickly become a significant bottleneck.
arXiv Detail & Related papers (2024-07-31T16:48:06Z)
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets. Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round. We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z)
Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds. We show strong evidences that variable-length is beneficial for compression in FL. We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z)
FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification [65.7405397206767]
We propose Federated Deep-Broad Learning (FedDBL) to achieve superior classification performance with limited training samples and only one-round communication. FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk.
arXiv Detail & Related papers (2023-02-24T14:27:41Z)
Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection. We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type. Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z)
Distributed Contrastive Learning for Medical Image Segmentation [16.3860181959878]
Supervised deep learning needs a large amount of labeled data to achieve high performance. In medical imaging analysis, each site may only have a limited amount of data and labels, which makes learning ineffective. We propose two federated self-supervised learning frameworks for medical image segmentation with limited annotations.
arXiv Detail & Related papers (2022-08-07T20:47:05Z)
Acceleration of Federated Learning with Alleviated Forgetting in Local Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy. We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage. Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
Privacy-Preserving Phishing Email Detection Based on Federated Learning and LSTM [0.4588028371034407]
Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents. We propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB) FPB allows common knowledge representation and sharing among different clients to safeguard the email security and privacy.
arXiv Detail & Related papers (2021-10-12T14:17:38Z)
WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices. We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.