Recognizing and Extracting Cybersecurtity-relevant Entities from Text
- URL: http://arxiv.org/abs/2208.01693v1
- Date: Tue, 2 Aug 2022 18:44:06 GMT
- Title: Recognizing and Extracting Cybersecurtity-relevant Entities from Text
- Authors: Casey Hanks, Michael Maiden, Priyanka Ranade, Tim Finin, Anupam Joshi
- Abstract summary: Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
- Score: 1.7499351967216343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cyber Threat Intelligence (CTI) is information describing threat vectors,
vulnerabilities, and attacks and is often used as training data for AI-based
cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a
strong need to develop community-accessible datasets to train existing AI-based
cybersecurity pipelines to efficiently and accurately extract meaningful
insights from CTI. We have created an initial unstructured CTI corpus from a
variety of open sources that we are using to train and test cybersecurity
entity models using the spaCy framework and exploring self-learning methods to
automatically recognize cybersecurity entities. We also describe methods to
apply cybersecurity domain entity linking with existing world knowledge from
Wikidata. Our future work will survey and test spaCy NLP tools and create
methods for continuous integration of new information extracted from text.
Related papers
- Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models [0.8192907805418583]
Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction.
This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs)
Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring.
Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering.
arXiv Detail & Related papers (2024-06-30T13:02:03Z) - The Security and Privacy of Mobile Edge Computing: An Artificial Intelligence Perspective [64.36680481458868]
Mobile Edge Computing (MEC) is a new computing paradigm that enables cloud computing and information technology (IT) services to be delivered at the network's edge.
This paper provides a survey of security and privacy in MEC from the perspective of Artificial Intelligence (AI)
We focus on new security and privacy issues, as well as potential solutions from the viewpoints of AI.
arXiv Detail & Related papers (2024-01-03T07:47:22Z) - NLP-Based Techniques for Cyber Threat Intelligence [13.958337678497163]
Survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence.
It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets.
It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
arXiv Detail & Related papers (2023-11-15T09:23:33Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - What are the attackers doing now? Automating cyber threat intelligence
extraction from text on pace with the changing threat landscape: A survey [1.1064955465386]
We systematically collect "CTI extraction from text"-related studies from the literature.
We identify the data sources, techniques, and CTI sharing formats utilized in the context of the proposed pipeline.
arXiv Detail & Related papers (2021-09-14T16:38:41Z) - Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms.
Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z) - A System for Automated Open-Source Threat Intelligence Gathering and
Management [53.65687495231605]
SecurityKG is a system for automated OSCTI gathering and management.
It uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors.
arXiv Detail & Related papers (2021-01-19T18:31:35Z) - A System for Efficiently Hunting for Cyber Threats in Computer Systems
Using Threat Intelligence [78.23170229258162]
We build ThreatRaptor, a system that facilitates cyber threat hunting in computer systems using OSCTI.
ThreatRaptor provides (1) an unsupervised, light-weight, and accurate NLP pipeline that extracts structured threat behaviors from unstructured OSCTI text, (2) a concise and expressive domain-specific query language, TBQL, to hunt for malicious system activities, and (3) a query synthesis mechanism that automatically synthesizes a TBQL query from the extracted threat behaviors.
arXiv Detail & Related papers (2021-01-17T19:44:09Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat
Reports [5.789368942487406]
We evaluate several classification approaches to automatically retrieve Tactics, Techniques and Procedures from unstructured text.
We present rcATT, a tool built on top of our findings and freely distributed to the security community to support cyber threat report automated analysis.
arXiv Detail & Related papers (2020-04-29T16:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.