Privacy-Preserving Phishing Email Detection Based on Federated Learning
and LSTM
- URL: http://arxiv.org/abs/2110.06025v1
- Date: Tue, 12 Oct 2021 14:17:38 GMT
- Title: Privacy-Preserving Phishing Email Detection Based on Federated Learning
and LSTM
- Authors: Yuwei Sun, Ng Chong, and Hideya Ochiai
- Abstract summary: Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents.
We propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB)
FPB allows common knowledge representation and sharing among different clients to safeguard the email security and privacy.
- Score: 0.4588028371034407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Phishing emails that appear legitimate lure people into clicking on the
attached malicious links or documents. Increasingly more sophisticated phishing
campaigns in recent years necessitate a more adaptive detection system other
than traditional signature-based methods. In this regard, natural language
processing (NLP) with deep neural networks (DNNs) is adopted for knowledge
acquisition from a large number of emails. However, such sensitive daily
communications containing personal information are difficult to collect on a
server for centralized learning in real life due to escalating privacy
concerns. To this end, we propose a decentralized phishing email detection
method called the Federated Phish Bowl (FPB) leveraging federated learning and
long short-term memory (LSTM). FPB allows common knowledge representation and
sharing among different clients through the aggregation of trained models to
safeguard the email security and privacy. A recent phishing email dataset was
collected from an intergovernmental organization to train the model. Moreover,
we evaluated the model performance based on various assumptions regarding the
total client number and the level of data heterogeneity. The comprehensive
experimental results suggest that FPB is robust to a continually increasing
client number and various data heterogeneity levels, retaining a detection
accuracy of 0.83 and protecting the privacy of sensitive email communications.
Related papers
- User-Centric Phishing Detection: A RAG and LLM-Based Approach [1.0858333811448098]
This paper presents a personalized phishing detection framework that integrates large language models with retrieval-augmented generation (RAG)<n>For each message, the system constructs user-specific context by retrieving a compact set of the user's historical legitimate emails.
arXiv Detail & Related papers (2026-01-29T04:42:18Z) - MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection [0.0]
This paper presents MeAJOR, a novel, multi-source phishing email dataset.<n>It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails.<n>By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource.
arXiv Detail & Related papers (2025-07-23T22:57:08Z) - Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat.
Traditional forgery detection methods directly centralized training on data.
The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z) - Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts [2.6012482282204004]
Phishing, a prevalent cybercrime tactic for decades, remains a significant threat in today's digital world.
This paper aims to analyze the effectiveness of 15 Large Language Models (LLMs) in detecting phishing attempts.
arXiv Detail & Related papers (2024-04-23T19:55:18Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - Email Summarization to Assist Users in Phishing Identification [1.433758865948252]
Cyber-phishing attacks are more precise, targeted, and tailored by training data to activate only in the presence of specific information or cues.
This work leverages transformer-based machine learning to analyze prospective psychological triggers.
We then amalgamate this information and present it to the user to allow them to (i) easily decide whether the email is "phishy" and (ii) self-learn advanced malicious patterns.
arXiv Detail & Related papers (2022-03-24T23:03:46Z) - Anomaly Detection in Emails using Machine Learning and Header
Information [0.0]
Anomalies in emails such as phishing and spam present major security risks.
Previous studies on email anomaly detection relied on a single type of anomaly and the analysis of the email body and subject content.
This study conducted feature extraction and selection on email header datasets and leveraged both multi and one-class anomaly detection approaches.
arXiv Detail & Related papers (2022-03-19T23:31:23Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Understanding Clipping for Federated Learning: Convergence and
Client-Level Differential Privacy [67.4471689755097]
This paper empirically demonstrates that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity.
We provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates.
arXiv Detail & Related papers (2021-06-25T14:47:19Z) - WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices.
We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z) - Evaluation of Federated Learning in Phishing Email Detection [24.85352882358906]
This paper builds upon a deep neural network model, particularly RNN and BERT for phishing email detection.
It analyzes the FL-entangled learning performance under various settings, including balanced and asymmetrical data distribution.
arXiv Detail & Related papers (2020-07-27T03:58:00Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.