Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs
- URL: http://arxiv.org/abs/2407.08888v1
- Date: Thu, 11 Jul 2024 23:04:16 GMT
- Title: Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs
- Authors: Andrey Yakymovych, Abhishek Singh,
- Abstract summary: This study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email.
We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations.
Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics.
- Score: 2.052800997441997
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.
Related papers
- Toward Mixture-of-Experts Enabled Trustworthy Semantic Communication for 6G Networks [82.3753728955968]
We introduce a novel Mixture-of-Experts (MoE)-based SemCom system.
This system comprises a gating network and multiple experts, each specializing in different security challenges.
The gating network adaptively selects suitable experts to counter heterogeneous attacks based on user-defined security requirements.
A case study in vehicular networks demonstrates the efficacy of the MoE-based SemCom system.
arXiv Detail & Related papers (2024-09-24T03:17:51Z) - DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification [4.585051136007553]
We introduce DomURLs_BERT, a pre-trained BERT-based encoder for detecting and classifying suspicious/malicious domains and URLs.
The proposed encoder outperforms state-of-the-art character-based deep learning models and cybersecurity-focused BERT models across multiple tasks and datasets.
arXiv Detail & Related papers (2024-09-13T18:59:13Z) - AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports [3.6785107661544805]
We present AnnoCTR, a new CC-BY-SA-licensed dataset of cyber threat reports.
The reports have been annotated by a domain expert with named entities, temporal expressions, and cybersecurity-specific concepts.
In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.
arXiv Detail & Related papers (2024-04-11T14:04:36Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - Blockchain-aided Secure Semantic Communication for AI-Generated Content
in Metaverse [59.04428659123127]
We propose a blockchain-aided semantic communication framework for AIGC services in virtual transportation networks.
We illustrate a training-based semantic attack scheme to generate adversarial semantic data by various loss functions.
We also design a semantic defense scheme that uses the blockchain and zero-knowledge proofs to tell the difference between the semantic similarities of adversarial and authentic semantic data.
arXiv Detail & Related papers (2023-01-25T02:32:02Z) - Unraveling Threat Intelligence Through the Lens of Malicious URL
Campaigns [21.185063151766798]
We analyse suspicious URLs from SIEM alerts via the perspective of malicious URL campaigns.
By first grouping URLs within 311M records gathered from VirusTotal into 2.6M suspicious clusters, we discovered 77.8K malicious campaigns.
We find 9.9M unique attributable to 18.3K multi-URL campaigns, and that only 2.97% of campaigns were found by security vendors.
arXiv Detail & Related papers (2022-08-26T06:10:13Z) - An Adversarial Attack Analysis on Malicious Advertisement URL Detection
Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks.
Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data.
In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z) - NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media [93.51739200834837]
We propose a dataset where both image and text are unmanipulated but mismatched.
We introduce several strategies for automatic retrieval of suitable images for the given captions.
Our large-scale automatically generated NewsCLIPpings dataset requires models to jointly analyze both modalities.
arXiv Detail & Related papers (2021-04-13T01:53:26Z) - Adversarial Semantic Collisions [129.55896108684433]
We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models.
We develop gradient-based approaches for generating semantic collisions.
We show how to generate semantic collisions that evade perplexity-based filtering.
arXiv Detail & Related papers (2020-11-09T20:42:01Z) - Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails.
We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.