Related papers: EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums

EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums

URL: http://arxiv.org/abs/2411.05479v1
Date: Fri, 08 Nov 2024 11:09:45 GMT
Title: EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums
Authors: Abdoul Nasser Hassane Amadou, Anas Motii, Saida Elouardi, EL Houcine Bergou,
Abstract summary: Underground forums serve as hubs for cybercriminal activities, offering a space for anonymity and evasion of online oversight. Identifying the key instigators behind these operations is essential but remains a complex challenge. This paper presents a novel method called EUREKHA, designed to identify these key hackers by modeling each user as a textual sequence.
Score: 1.5192294544599656
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Underground forums serve as hubs for cybercriminal activities, offering a space for anonymity and evasion of conventional online oversight. In these hidden communities, malicious actors collaborate to exchange illicit knowledge, tools, and tactics, driving a range of cyber threats from hacking techniques to the sale of stolen data, malware, and zero-day exploits. Identifying the key instigators (i.e., key hackers), behind these operations is essential but remains a complex challenge. This paper presents a novel method called EUREKHA (Enhancing User Representation for Key Hacker Identification in Underground Forums), designed to identify these key hackers by modeling each user as a textual sequence. This sequence is processed through a large language model (LLM) for domain-specific adaptation, with LLMs acting as feature extractors. These extracted features are then fed into a Graph Neural Network (GNN) to model user structural relationships, significantly improving identification accuracy. Furthermore, we employ BERTopic (Bidirectional Encoder Representations from Transformers Topic Modeling) to extract personalized topics from user-generated content, enabling multiple textual representations per user and optimizing the selection of the most representative sequence. Our study demonstrates that fine-tuned LLMs outperform state-of-the-art methods in identifying key hackers. Additionally, when combined with GNNs, our model achieves significant improvements, resulting in approximately 6% and 10% increases in accuracy and F1-score, respectively, over existing methods. EUREKHA was tested on the Hack-Forums dataset, and we provide open-source access to our code.

Related papers

Detecting Quishing Attacks with Machine Learning Techniques Through QR Code Analysis [2.8161155726745237]
The rise of QR code based phishing ("Quishing") poses a growing cybersecurity threat.<n>Existing detection methods predominantly focus on URL analysis, which requires the extraction of the QR code payload.<n>We propose the first framework for quishing detection that directly analyzes QR code structure and pixel patterns without extracting the embedded content.
arXiv Detail & Related papers (2025-05-06T11:47:13Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
A Privacy-preserving key transmission protocol to distribute QRNG keys using zk-SNARKs [2.254434034390528]
Quantum Random Number Generators can provide high-quality keys for cryptographic algorithms. Existing Entropy-as-a-Service solutions require users to trust the central authority distributing the key material. We present a novel key transmission protocol that allows users to obtain cryptographic material generated by a QRNG in such a way that the server is unable to identify which user is receiving each key.
arXiv Detail & Related papers (2024-01-29T14:00:37Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
Detection, Explanation and Filtering of Cyber Attacks Combining Symbolic and Sub-Symbolic Methods [0.0]
We are exploring combining symbolic and sub-symbolic methods in the area of cybersecurity that incorporate domain knowledge. The proposed method is shown to produce intuitive explanations for alerts for a diverse range of scenarios. Not only do the explanations provide deeper insights into the alerts, but they also lead to a reduction of false positive alerts by 66% and by 93% when including the fidelity metric.
arXiv Detail & Related papers (2022-12-23T09:03:51Z)
MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER. MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z)
Deep Learning Algorithm for Threat Detection in Hackers Forum (Deep Web) [0.0]
We propose a novel approach for detecting cyberthreats using a deep learning algorithm Long Short-Term Memory (LSTM) Our model can be easily deployed by organizations in securing digital communications and detection of vulnerability exposure before cyberattack.
arXiv Detail & Related papers (2022-02-03T07:49:44Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms. Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z)
Predicting Organizational Cybersecurity Risk: A Deep Learning Approach [0.0]
Hackers use exploits found on hacker forums to carry out complex cyberattacks. We propose a hacker forum entity recognition framework (HackER) to identify exploits and the entities that the exploits target. HackER then uses a bidirectional long short-term memory model (BiLSTM) to create a predictive model for what companies will be targeted by exploits.
arXiv Detail & Related papers (2020-12-26T01:15:34Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
FAIRS -- Soft Focus Generator and Attention for Robust Object Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks. We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.