Evaluation of Federated Learning in Phishing Email Detection
- URL: http://arxiv.org/abs/2007.13300v3
- Date: Fri, 21 May 2021 06:17:50 GMT
- Title: Evaluation of Federated Learning in Phishing Email Detection
- Authors: Chandra Thapa, Jun Wen Tang, Alsharif Abuadbba, Yansong Gao, Seyit
Camtepe, Surya Nepal, Mahathir Almashor, Yifeng Zheng
- Abstract summary: This paper builds upon a deep neural network model, particularly RNN and BERT for phishing email detection.
It analyzes the FL-entangled learning performance under various settings, including balanced and asymmetrical data distribution.
- Score: 24.85352882358906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of Artificial Intelligence (AI) to detect phishing emails is
primarily dependent on large-scale centralized datasets, which opens it up to a
myriad of privacy, trust, and legal issues. Moreover, organizations are loathed
to share emails, given the risk of leakage of commercially sensitive
information. So, it is uncommon to obtain sufficient emails to train a global
AI model efficiently. Accordingly, privacy-preserving distributed and
collaborative machine learning, particularly Federated Learning (FL), is a
desideratum. Already prevalent in the healthcare sector, questions remain
regarding the effectiveness and efficacy of FL-based phishing detection within
the context of multi-organization collaborations. To the best of our knowledge,
the work herein is the first to investigate the use of FL in email
anti-phishing. This paper builds upon a deep neural network model, particularly
RNN and BERT for phishing email detection. It analyzes the FL-entangled
learning performance under various settings, including balanced and
asymmetrical data distribution. Our results corroborate comparable performance
statistics of FL in phishing email detection to centralized learning for
balanced datasets, and low organization counts. Moreover, we observe a
variation in performance when increasing organizational counts. For a fixed
total email dataset, the global RNN based model suffers by a 1.8% accuracy drop
when increasing organizational counts from 2 to 10. In contrast, BERT accuracy
rises by 0.6% when going from 2 to 5 organizations. However, if we allow
increasing the overall email dataset with the introduction of new organizations
in the FL framework, the organizational level performance is improved by
achieving a faster convergence speed. Besides, FL suffers in its overall global
model performance due to highly unstable outputs if the email dataset
distribution is highly asymmetric.
Related papers
- MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection [0.0]
This paper presents MeAJOR, a novel, multi-source phishing email dataset.<n>It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails.<n>By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource.
arXiv Detail & Related papers (2025-07-23T22:57:08Z) - Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models [43.26028399395612]
We propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods.
First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process.
Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA.
arXiv Detail & Related papers (2024-09-30T18:12:18Z) - Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning? [50.03434441234569]
Federated Learning (FL) has gained significant popularity due to its effectiveness in training machine learning models across diverse sites without requiring direct data sharing.
While various algorithms have shown that FL with local updates is a communication-efficient distributed learning framework, the generalization performance of FL with local updates has received comparatively less attention.
arXiv Detail & Related papers (2024-09-05T19:00:18Z) - A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation [5.011091042850546]
Adapting foundation models for medical image analysis requires finetuning them on a considerable amount of data.
collecting task-specific medical data for such finetuning at a central location raises many privacy concerns.
Although Federated learning (FL) provides an effective means for training on private decentralized data, communication costs in federating large foundation models can quickly become a significant bottleneck.
arXiv Detail & Related papers (2024-07-31T16:48:06Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - Fed-CVLC: Compressing Federated Learning Communications with
Variable-Length Codes [54.18186259484828]
In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds.
We show strong evidences that variable-length is beneficial for compression in FL.
We present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response to the dynamics of model updates.
arXiv Detail & Related papers (2024-02-06T07:25:21Z) - FedDBL: Communication and Data Efficient Federated Deep-Broad Learning
for Histopathological Tissue Classification [65.7405397206767]
We propose Federated Deep-Broad Learning (FedDBL) to achieve superior classification performance with limited training samples and only one-round communication.
FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications.
Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk.
arXiv Detail & Related papers (2023-02-24T14:27:41Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - Distributed Contrastive Learning for Medical Image Segmentation [16.3860181959878]
Supervised deep learning needs a large amount of labeled data to achieve high performance.
In medical imaging analysis, each site may only have a limited amount of data and labels, which makes learning ineffective.
We propose two federated self-supervised learning frameworks for medical image segmentation with limited annotations.
arXiv Detail & Related papers (2022-08-07T20:47:05Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z) - Privacy-Preserving Phishing Email Detection Based on Federated Learning
and LSTM [0.4588028371034407]
Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents.
We propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB)
FPB allows common knowledge representation and sharing among different clients to safeguard the email security and privacy.
arXiv Detail & Related papers (2021-10-12T14:17:38Z) - WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices.
We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.