Smishing Dataset I: Phishing SMS Dataset from Smishtank.com
- URL: http://arxiv.org/abs/2402.18430v2
- Date: Sun, 28 Apr 2024 19:12:53 GMT
- Title: Smishing Dataset I: Phishing SMS Dataset from Smishtank.com
- Authors: Daniel Timko, Muhammad Lutfor Rahman,
- Abstract summary: We present the community-sourced smishing datasets from the smishtank.com.
In the contribution of our work, we provide a corpus of 1090 smishing samples that have been publicly submitted through the site.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While smishing (SMS Phishing) attacks have risen to become one of the most common types of social engineering attacks, there is a lack of relevant smishing datasets. One of the biggest challenges in the domain of smishing prevention is the availability of fresh smishing datasets. Additionally, as time persists, smishing campaigns are shut down and the crucial information related to the attack are lost. With the changing nature of smishing attacks, a consistent flow of new smishing examples is needed by both researchers and engineers to create effective defenses. In this paper, we present the community-sourced smishing datasets from the smishtank.com. It provides a wealth of information relevant to combating smishing attacks through the breakdown and analysis of smishing samples at the point of submission. In the contribution of our work, we provide a corpus of 1090 smishing samples that have been publicly submitted through the site. Each message includes information relating to the sender, message body, and any brands referenced in the message. Additionally, when a URL is found, we provide additional information on the domain, VirusTotal results, and a characterization of the URL. Through the open access of fresh smishing data, we empower academia and industries to create robust defenses against this evolving threat.
Related papers
- WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models [66.34505141027624]
We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics.
WildTeaming reveals previously unidentified vulnerabilities of frontier LLMs, resulting in up to 4.6x more diverse and successful adversarial attacks.
arXiv Detail & Related papers (2024-06-26T17:31:22Z) - AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns [0.0]
We propose AbuseGPT method to show how the existing generative AI-based chatbots can be exploited by attackers in real world to create smishing texts.
We have found strong empirical evidences to show that attackers can exploit ethical standards in the existing generative AI-based chatbots services.
We also discuss some future research directions and guidelines to protect the abuse of generative AI-based services.
arXiv Detail & Related papers (2024-02-15T05:49:22Z) - Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks [55.603893267803265]
Large Language Models (LLMs) are susceptible to Jailbreaking attacks.
Jailbreaking attacks aim to extract harmful information by subtly modifying the attack query.
We focus on a new attack form, called Contextual Interaction Attack.
arXiv Detail & Related papers (2024-02-14T13:45:19Z) - A Quantitative Study of SMS Phishing Detection [0.0]
We conducted an online survey on smishing detection with 187 participants.
We presented them with 16 SMS screenshots and evaluated how different factors affect their decision making process in smishing detection.
We found that participants had more difficulty identifying real messages from fake ones, with an accuracy of 67.1% with fake messages and 43.6% with real messages.
arXiv Detail & Related papers (2023-11-12T17:56:42Z) - Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game [86.66627242073724]
This paper presents a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection.
To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs.
We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking.
arXiv Detail & Related papers (2023-11-02T06:13:36Z) - Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights.
We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks.
We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z) - Targeted Attacks: Redefining Spear Phishing and Business Email Compromise [0.17175834535889653]
Some rare, severely damaging email threats - known as spear phishing or Business Email Compromise - have emerged.
We describe targeted-attack-detection techniques as well as social-engineering methods used by fraudsters.
We present text-based attacks - with textual content as malicious payload - and compare non-targeted and targeted variants.
arXiv Detail & Related papers (2023-09-25T14:21:59Z) - Commercial Anti-Smishing Tools and Their Comparative Effectiveness Against Modern Threats [0.0]
We developed a test bed for measuring the effectiveness of popular anti-smishing tools against fresh smishing attacks.
Most anti-phishing apps and bulk messaging services didn't filter smishing messages beyond the carrier blocking.
While carriers did not block any benign messages, they were only able to reach a 25-35% blocking rate for smishing messages.
arXiv Detail & Related papers (2023-09-14T06:08:22Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Cybersecurity Misinformation Detection on Social Media: Case Studies on
Phishing Reports and Zoom's Threats [1.2387676601792899]
We propose novel approaches for detecting misinformation about cybersecurity and privacy threats on social media.
We developed a framework for detecting inaccurate phishing claims on Twitter.
We also proposed another framework for detecting misinformation related to Zoom's security and privacy threats on multiple platforms.
arXiv Detail & Related papers (2021-10-23T20:45:24Z) - TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy.
It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data.
We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.