Towards Text-based Phishing Detection
- URL: http://arxiv.org/abs/2111.01676v2
- Date: Wed, 3 Nov 2021 13:57:58 GMT
- Title: Towards Text-based Phishing Detection
- Authors: Gilchan Park and Julia M. Taylor
- Abstract summary: This paper reports on an experiment into text-based phishing detection using readily available resources and without the use of semantics.
The results obtained in recognizing phishing emails are considerably better than the previously reported work; but the rate of text falsely identified as phishing is slightly worse.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper reports on an experiment into text-based phishing detection using
readily available resources and without the use of semantics. The developed
algorithm is a modified version of previously published work that works with
the same tools. The results obtained in recognizing phishing emails are
considerably better than the previously reported work; but the rate of text
falsely identified as phishing is slightly worse. It is expected that adding
semantic component will reduce the false positive rate while preserving the
detection accuracy.
Related papers
- Phishing Codebook: A Structured Framework for the Characterization of Phishing Emails [17.173114048398954]
Phishing is one of the most prevalent and expensive types of cybercrime faced by organizations and individuals worldwide.
Most prior research has focused on various technical features and traditional representations of text to characterize phishing emails.
In this paper, we dissect the structure of phishing emails to gain a better understanding of the factors that influence human decision-making.
arXiv Detail & Related papers (2024-08-16T18:30:53Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection [23.962076093344166]
We propose an innovative deep learning-based approach for phishing attack localization.
Our method can not only predict the vulnerability of the email data but also automatically learn and figure out the most important and phishing-relevant information.
arXiv Detail & Related papers (2024-02-27T00:03:07Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Profiler: Profile-Based Model to Detect Phishing Emails [15.109679047753355]
We propose a multidimensional risk assessment of emails to reduce the feasibility of an attacker adapting their email and avoiding detection.
We develop a risk assessment framework that includes three models which analyse an email's (1) threat level, (2) cognitive manipulation, and (3) email type.
Our Profiler can be used in conjunction with ML approaches, to reduce their misclassifications or as a labeller for large email data sets in the training stage.
arXiv Detail & Related papers (2022-08-18T10:01:55Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z) - Adaptive Shrink-Mask for Text Detection [91.34459257409104]
Existing real-time text detectors reconstruct text contours by shrink-masks directly.
The dependence on predicted shrink-masks leads to unstable detection results.
Super-pixel Window (SPW) is designed to supervise the network.
arXiv Detail & Related papers (2021-11-18T07:38:57Z) - Phishing Detection through Email Embeddings [2.099922236065961]
The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature.
In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings.
Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate.
arXiv Detail & Related papers (2020-12-28T21:16:41Z) - PhishZip: A New Compression-based Algorithm for Detecting Phishing
Websites [12.468922937529966]
PhishZip is a novel phishing detection approach using a compression algorithm to perform website classification.
We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection.
arXiv Detail & Related papers (2020-07-22T00:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.