CyNER: A Python Library for Cybersecurity Named Entity Recognition
- URL: http://arxiv.org/abs/2204.05754v1
- Date: Fri, 8 Apr 2022 16:49:32 GMT
- Title: CyNER: A Python Library for Cybersecurity Named Entity Recognition
- Authors: Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, Nidhi Rastogi
- Abstract summary: CyNER is an open-source python library for cybersecurity entity recognition.
We provide models trained on a diverse corpus that users can readily use.
The library is made publicly available.
- Score: 3.871148938060281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open Cyber threat intelligence (OpenCTI) information is available in an
unstructured format from heterogeneous sources on the Internet. We present
CyNER, an open-source python library for cybersecurity named entity recognition
(NER). CyNER combines transformer-based models for extracting
cybersecurity-related entities, heuristics for extracting different indicators
of compromise, and publicly available NER models for generic entity types. We
provide models trained on a diverse corpus that users can readily use. Events
are described as classes in previous research - MALOnt2.0 (Christian et al.,
2021) and MALOnt (Rastogi et al., 2020) and together extract a wide range of
malware attack details from a threat intelligence corpus. The user can combine
predictions from multiple different approaches to suit their needs. The library
is made publicly available.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z) - ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain [0.0]
A new focused crawler is proposed called ThreatCrawl.
It uses BiBERT-based models to classify documents and adapt its crawling path dynamically.
It yields harvest rates of up to 52%, which are, to the best of our knowledge, better than the current state of the art.
arXiv Detail & Related papers (2023-04-24T09:53:33Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Categorical composable cryptography: extended version [1.1970409518725493]
We formalize the simulation paradigm of cryptography in terms of category theory.
We show that protocols secure against abstract attacks form a symmetric monoidal category.
Our model is able to incorporate computational security, set-up assumptions and various attack models.
arXiv Detail & Related papers (2022-08-28T15:07:00Z) - Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks.
CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z) - Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms.
Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z) - Deep Learning Approach for Intelligent Named Entity Recognition of Cyber
Security [5.180648702293017]
Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data.
A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper.
The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset.
arXiv Detail & Related papers (2020-03-31T00:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.