Related papers: LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition

LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition

URL: http://arxiv.org/abs/2409.10521v1
Date: Fri, 30 Aug 2024 08:35:48 GMT
Title: LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition
Authors: Houssem Gasmi, Jannik Laval, Abdelaziz Bouras,
Abstract summary: The model demonstrated in this paper is domain independent and does not rely on any features specific to the entities in the cybersecurity domain. The results we obtained showed that this method outperforms the state of the art methods given an annotated corpus of a decent size.
Score: 1.411911111800469
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The automated and timely conversion of cybersecurity information from unstructured online sources, such as blogs and articles to more formal representations has become a necessity for many applications in the domain nowadays. Named Entity Recognition (NER) is one of the early phases towards this goal. It involves the detection of the relevant domain entities, such as product, version, attack name, etc. in technical documents. Although generally considered a simple task in the information extraction field, it is quite challenging in some domains like cybersecurity because of the complex structure of its entities. The state of the art methods require time-consuming and labor intensive feature engineering that describes the properties of the entities, their context, domain knowledge, and linguistic characteristics. The model demonstrated in this paper is domain independent and does not rely on any features specific to the entities in the cybersecurity domain, hence does not require expert knowledge to perform feature engineering. The method used relies on a type of recurrent neural networks called Long Short-Term Memory (LSTM) and the Conditional Random Fields (CRFs) method. The results we obtained showed that this method outperforms the state of the art methods given an annotated corpus of a decent size.

Related papers

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text [11.417612899344697]
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense.<n>Existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines.<n>We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs.
arXiv Detail & Related papers (2025-05-17T12:46:10Z)
Unsupervised Named Entity Disambiguation for Low Resource Domains [0.4297070083645049]
We present an unsupervised approach leveraging the concept of Group Steiner Trees ( GST) GST can identify the most relevant candidates for entity disambiguation using the contextual similarities across candidate entities. We outperform the state-of-the-art unsupervised methods by more than 40% (in avg.) in terms of Precision@1 across various domain-specific datasets.
arXiv Detail & Related papers (2024-12-13T11:35:00Z)
Large language models as oracles for instantiating ontologies with domain-specific knowledge [0.0]
Endowing intelligent systems with semantic data commonly requires designing and instantiating with domain-specific knowledge. The resulting experience process is therefore time-consuming, error-prone, and often biased by the personal background of ontology designer. We propose a novel domain-independent approach to automatically instantiate with domain-specific knowledge.
arXiv Detail & Related papers (2024-04-05T14:04:07Z)
Domain-Controlled Prompt Learning [49.45309818782329]
Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms. We propose a textbfDomain-Controlled Prompt Learning for the specific domains. Our method achieves state-of-the-art performance in specific domain image recognition datasets.
arXiv Detail & Related papers (2023-09-30T02:59:49Z)
Towards Generalization on Real Domain for Single Image Dehazing via Meta-Learning [41.99615673136883]
Internal information learned from synthesized images is usually sub-optimal in real domains. We present a domain generalization framework based on meta-learning to dig out representative internal properties of real hazy domains. Our proposed method has superior generalization ability than the state-of-the-art competitors.
arXiv Detail & Related papers (2022-11-14T07:04:00Z)
Nested Named Entity Recognition from Medical Texts: An Adaptive Shared Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon. The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module. Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z)
Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain [52.783709712318405]
Unsupervised domain adaptation (UDA) has attracted considerable attention, which transfers knowledge from a label-rich source domain to a related but unlabeled target domain. We propose a novel style-aware feature fusion method (SAFF) to bridge the large domain gap and transfer knowledge while alleviating the loss of class-discnative information.
arXiv Detail & Related papers (2022-09-05T10:06:03Z)
Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence [7.321994923276344]
We propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens. In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method. We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-07-01T06:55:12Z)
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
Knowledge Graph Anchored Information-Extraction for Domain-Specific Insights [1.6308268213252761]
We use a task-based approach for fulfilling specific information needs within a new domain. A pipeline constructed of state of the art NLP technologies is used to automatically extract an instance level semantic structure.
arXiv Detail & Related papers (2021-04-18T19:28:10Z)
Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment. We develop the first multi-agent reinforcement learning technique for temporal logic specifications. We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z)
Domain-Transferable Method for Named Entity Recognition Task [0.6040938686276304]
This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other.
arXiv Detail & Related papers (2020-11-24T15:45:52Z)
Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains. We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.