A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence
- URL: http://arxiv.org/abs/2109.06932v1
- Date: Tue, 14 Sep 2021 19:26:08 GMT
- Title: A Crawler Architecture for Harvesting the Clear, Social, and Dark Web
for IoT-Related Cyber-Threat Intelligence
- Authors: Paris Koloveas, Thanasis Chantzios, Christos Tryfonopoulos, Spiros
Skiadopoulos
- Abstract summary: The clear, social, and dark web have lately been identified as rich sources of valuable cyber-security information.
We present a novel crawling architecture for transparently harvesting data from security websites in the clear web, security forums in the social web, and hacker forums/marketplaces in the dark web.
- Score: 1.1661238776379117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The clear, social, and dark web have lately been identified as rich sources
of valuable cyber-security information that -given the appropriate tools and
methods-may be identified, crawled and subsequently leveraged to actionable
cyber-threat intelligence. In this work, we focus on the information gathering
task, and present a novel crawling architecture for transparently harvesting
data from security websites in the clear web, security forums in the social
web, and hacker forums/marketplaces in the dark web. The proposed architecture
adopts a two-phase approach to data harvesting. Initially a machine
learning-based crawler is used to direct the harvesting towards websites of
interest, while in the second phase state-of-the-art statistical language
modelling techniques are used to represent the harvested information in a
latent low-dimensional feature space and rank it based on its potential
relevance to the task at hand. The proposed architecture is realised using
exclusively open-source tools, and a preliminary evaluation with crowdsourced
results demonstrates its effectiveness.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Informed Meta-Learning [55.2480439325792]
Meta-learning and informed ML stand out as two approaches for incorporating prior knowledge into ML pipelines.
We formalise a hybrid paradigm, informed meta-learning, facilitating the incorporation of priors from unstructured knowledge representations.
We demonstrate the potential benefits of informed meta-learning in improving data efficiency, robustness to observational noise and task distribution shifts.
arXiv Detail & Related papers (2024-02-25T15:08:37Z) - TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild [0.06597195879147556]
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy.
Previous research has focused on improving individual components of the extraction process.
The community lacks open-source platforms for deploying streaming CTI data pipelines in the wild.
arXiv Detail & Related papers (2024-02-15T14:29:21Z) - A Responsive Framework for Research Portals Data using Semantic Web
Technology [0.6798775532273751]
The research aims to address this issue by designing a framework for the semantic organization of research portal data.
The framework focuses on the extraction of information from two specific research portals, namely Microsoft Academic and IEEE Xplore.
arXiv Detail & Related papers (2023-06-20T16:12:33Z) - Your Room is not Private: Gradient Inversion Attack on Reinforcement
Learning [47.96266341738642]
Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information.
This paper proposes an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals.
arXiv Detail & Related papers (2023-06-15T16:53:26Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Knowledge mining of unstructured information: application to
cyber-domain [0.0]
We present and implement a novel knowledge graph and knowledge mining framework for extracting relevant information from free-form text about incidents in the cyber domain.
Our framework includes a machine learning based pipeline as well as crawling methods for generating graphs of entities, attackers and the related information.
We test our framework on publicly available cyber incident datasets to evaluate the accuracy of our knowledge mining methods as well as the usefulness of the framework in the use of cyber analysts.
arXiv Detail & Related papers (2021-09-08T18:01:56Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Intrusion detection in computer systems by using artificial neural
networks with Deep Learning approaches [0.0]
Intrusion detection into computer networks has become one of the most important issues in cybersecurity.
This paper focuses on the design and implementation of an intrusion detection system based on Deep Learning architectures.
arXiv Detail & Related papers (2020-12-15T19:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.