Multi-features based Semantic Augmentation Networks for Named Entity
Recognition in Threat Intelligence
- URL: http://arxiv.org/abs/2207.00232v1
- Date: Fri, 1 Jul 2022 06:55:12 GMT
- Title: Multi-features based Semantic Augmentation Networks for Named Entity
Recognition in Threat Intelligence
- Authors: Peipei Liu, Hong Li, Zuoguang Wang, Jie Liu, Yimo Ren, Hongsong Zhu
- Abstract summary: We propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens.
In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method.
We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.
- Score: 7.321994923276344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting cybersecurity entities such as attackers and vulnerabilities from
unstructured network texts is an important part of security analysis. However,
the sparsity of intelligence data resulted from the higher frequency variations
and the randomness of cybersecurity entity names makes it difficult for current
methods to perform well in extracting security-related concepts and entities.
To this end, we propose a semantic augmentation method which incorporates
different linguistic features to enrich the representation of input tokens to
detect and classify the cybersecurity names over unstructured text. In
particular, we encode and aggregate the constituent feature, morphological
feature and part of speech feature for each input token to improve the
robustness of the method. More than that, a token gets augmented semantic
information from its most similar K words in cybersecurity domain corpus where
an attentive module is leveraged to weigh differences of the words, and from
contextual clues based on a large-scale general field corpus. We have conducted
experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the
results demonstrate the effectiveness of the proposed method.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition [1.411911111800469]
The model demonstrated in this paper is domain independent and does not rely on any features specific to the entities in the cybersecurity domain.
The results we obtained showed that this method outperforms the state of the art methods given an annotated corpus of a decent size.
arXiv Detail & Related papers (2024-08-30T08:35:48Z) - Undecimated Wavelet Transform for Word Embedded Semantic Marginal
Autoencoder in Security improvement and Denoising different Languages [0.0]
This research study provides a novel strategy for improving security measures and denoising multiple languages.
The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns.
The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data.
arXiv Detail & Related papers (2023-07-06T04:10:40Z) - Blockchain-aided Secure Semantic Communication for AI-Generated Content
in Metaverse [59.04428659123127]
We propose a blockchain-aided semantic communication framework for AIGC services in virtual transportation networks.
We illustrate a training-based semantic attack scheme to generate adversarial semantic data by various loss functions.
We also design a semantic defense scheme that uses the blockchain and zero-knowledge proofs to tell the difference between the semantic similarities of adversarial and authentic semantic data.
arXiv Detail & Related papers (2023-01-25T02:32:02Z) - Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic
Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission.
NeSy AI is proposed as a pillar for learning causal structure behind the observed data.
GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z) - A Deep Learning Approach for Ontology Enrichment from Unstructured Text [2.932750332087746]
Existing information vulnerabilities on attacks, controls, and advisories available on the web provide an opportunity to represent and perform security analytics.
Ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases, and sentences.
Bidirectional LSTMs trained on a large DB dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence is deployed to enrich ISO-based information security.
arXiv Detail & Related papers (2021-12-16T01:32:21Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - OntoEnricher: A Deep Learning Approach for Ontology Enrichment from
Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns.
This necessitates dynamic and automated enrichment of information security.
Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - Deep Learning Approach for Intelligent Named Entity Recognition of Cyber
Security [5.180648702293017]
Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data.
A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper.
The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset.
arXiv Detail & Related papers (2020-03-31T00:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.