Related papers: CyNER: A Python Library for Cybersecurity Named Entity Recognition

CyNER: A Python Library for Cybersecurity Named Entity Recognition

URL: http://arxiv.org/abs/2204.05754v1
Date: Fri, 8 Apr 2022 16:49:32 GMT
Title: CyNER: A Python Library for Cybersecurity Named Entity Recognition
Authors: Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, Nidhi Rastogi
Abstract summary: CyNER is an open-source python library for cybersecurity entity recognition. We provide models trained on a diverse corpus that users can readily use. The library is made publicly available.
Score: 3.871148938060281
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly available NER models for generic entity types. We provide models trained on a diverse corpus that users can readily use. Events are described as classes in previous research - MALOnt2.0 (Christian et al., 2021) and MALOnt (Rastogi et al., 2020) and together extract a wide range of malware attack details from a threat intelligence corpus. The user can combine predictions from multiple different approaches to suit their needs. The library is made publicly available.

Related papers

CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules. We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning. We present three variants of the NER task, together with a dataset to support them. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z)
ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain [0.0]
A new focused crawler is proposed called ThreatCrawl. It uses BiBERT-based models to classify documents and adapt its crawling path dynamically. It yields harvest rates of up to 52%, which are, to the best of our knowledge, better than the current state of the art.
arXiv Detail & Related papers (2023-04-24T09:53:33Z)
PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user. We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
Categorical composable cryptography: extended version [1.1970409518725493]
We formalize the simulation paradigm of cryptography in terms of category theory. We show that protocols secure against abstract attacks form a symmetric monoidal category. Our model is able to incorporate computational security, set-up assumptions and various attack models.
arXiv Detail & Related papers (2022-08-28T15:07:00Z)
Recognizing and Extracting Cybersecurtity-relevant Entities from Text [1.7499351967216343]
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks. CTI is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG)
arXiv Detail & Related papers (2022-08-02T18:44:06Z)
Generating Cyber Threat Intelligence to Discover Potential Security Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms. Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z)
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Security [5.180648702293017]
Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data. A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper. The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset.
arXiv Detail & Related papers (2020-03-31T00:36:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.