A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence
- URL: http://arxiv.org/abs/2109.06932v1
- Date: Tue, 14 Sep 2021 19:26:08 GMT
- Title: A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence
- Authors: Paris Koloveas, Thanasis Chantzios, Christos Tryfonopoulos, Spiros
Skiadopoulos
- Abstract summary: The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information.
We present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.
- Score: 1.1661238776379117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The clear, social, and dark web have lately been identified as rich sources
of valuable cyber-security information that -given the appropriate tools and
methods-may be identified, crawled and subsequently leveraged to actionable
cyber-threat intelligence. In this work, we focus on the information gathering
task, and present a novel crawling architecture for transparently harvesting
data from security websites in the clear web, security forums in the social
web, and hacker forums/marketplaces in the dark web. The proposed architecture
adopts a two-phase approach to data harvesting. Initially a machine
learning-based crawler is used to direct the harvesting towards websites of
interest, while in the second phase state-of-the-art statistical language
modelling techniques are used to represent the harvested information in a
latent low-dimensional feature space and rank it based on its potential
relevance to the task at hand. The proposed architecture is realised using
exclusively open-source tools, and a preliminary evaluation with crowdsourced
results demonstrates its effectiveness.
Related papers
- Combining Threat Intelligence with IoT Scanning to Predict Cyber Attack [0.0]
I have proposed a novel methodology for collecting and analyzing the Dark Web information.
I want to contribute to the existing literature on cyber-security that could potentially guide in both policy-making and intelligence research.
arXiv Detail & Related papers (2024-11-26T23:00:51Z) - CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild [0.06597195879147556]
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy.
Previous research has focused on improving individual components of the extraction process.
The community lacks open-source platforms for deploying streaming CTI data pipelines in the wild.
arXiv Detail & Related papers (2024-02-15T14:29:21Z) - A Responsive Framework for Research Portals Data using Semantic Web
Technology [0.6798775532273751]
The research aims to address this issue by designing a framework for the semantic organization of research portal data.
The framework focuses on the extraction of information from two specific research portals, namely Microsoft Academic and IEEE Xplore.
arXiv Detail & Related papers (2023-06-20T16:12:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Knowledge mining of unstructured information: application to
cyber-domain [0.0]
We present and implement a novel knowledge graph and knowledge mining framework for extracting relevant information from free-form text about incidents in the cyber domain.
Our framework includes a machine learning based pipeline as well as crawling methods for generating graphs of entities, attackers and the related information.
We test our framework on publicly available cyber incident datasets to evaluate the accuracy of our knowledge mining methods as well as the usefulness of the framework in the use of cyber analysts.
arXiv Detail & Related papers (2021-09-08T18:01:56Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z) - Survey of Network Intrusion Detection Methods from the Perspective of
the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector.
We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods.
As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.