Multi-features based Semantic Augmentation Networks for Named Entity
Recognition in Threat Intelligence
- URL: http://arxiv.org/abs/2207.00232v1
- Date: Fri, 1 Jul 2022 06:55:12 GMT
- Title: Multi-features based Semantic Augmentation Networks for Named Entity
Recognition in Threat Intelligence
- Authors: Peipei Liu, Hong Li, Zuoguang Wang, Jie Liu, Yimo Ren, Hongsong Zhu
- Abstract summary: We propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens.
In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method.
We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.
- Score: 7.321994923276344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting cybersecurity entities such as attackers and vulnerabilities from
unstructured network texts is an important part of security analysis. However,
the sparsity of intelligence data resulted from the higher frequency variations
and the randomness of cybersecurity entity names makes it difficult for current
methods to perform well in extracting security-related concepts and entities.
To this end, we propose a semantic augmentation method which incorporates
different linguistic features to enrich the representation of input tokens to
detect and classify the cybersecurity names over unstructured text. In
particular, we encode and aggregate the constituent feature, morphological
feature and part of speech feature for each input token to improve the
robustness of the method. More than that, a token gets augmented semantic
information from its most similar K words in cybersecurity domain corpus where
an attentive module is leveraged to weigh differences of the words, and from
contextual clues based on a large-scale general field corpus. We have conducted
experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the
results demonstrate the effectiveness of the proposed method.
Related papers
- Undecimated Wavelet Transform for Word Embedded Semantic Marginal
Autoencoder in Security improvement and Denoising different Languages [0.0]
This research study provides a novel strategy for improving security measures and denoising multiple languages.
The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns.
The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data.
arXiv Detail & Related papers (2023-07-06T04:10:40Z) - Blockchain-aided Secure Semantic Communication for AI-Generated Content
in Metaverse [59.04428659123127]
We propose a blockchain-aided semantic communication framework for AIGC services in virtual transportation networks.
We illustrate a training-based semantic attack scheme to generate adversarial semantic data by various loss functions.
We also design a semantic defense scheme that uses the blockchain and zero-knowledge proofs to tell the difference between the semantic similarities of adversarial and authentic semantic data.
arXiv Detail & Related papers (2023-01-25T02:32:02Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic
Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission.
NeSy AI is proposed as a pillar for learning causal structure behind the observed data.
GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z) - A Deep Learning Approach for Ontology Enrichment from Unstructured Text [2.932750332087746]
Existing information vulnerabilities on attacks, controls, and advisories available on the web provide an opportunity to represent and perform security analytics.
Ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases, and sentences.
Bidirectional LSTMs trained on a large DB dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence is deployed to enrich ISO-based information security.
arXiv Detail & Related papers (2021-12-16T01:32:21Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - OntoEnricher: A Deep Learning Approach for Ontology Enrichment from
Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns.
This necessitates dynamic and automated enrichment of information security.
Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - Deep Learning Approach for Intelligent Named Entity Recognition of Cyber
Security [5.180648702293017]
Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data.
A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper.
The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset.
arXiv Detail & Related papers (2020-03-31T00:36:19Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z) - TNT-KID: Transformer-based Neural Tagger for Keyword Identification [7.91883337742071]
We present a novel algorithm for keyword identification called Transformer-based Neural Tagger for Keyword IDentification (TNT-KID)
By adapting the transformer architecture for a specific task at hand and leveraging language model pretraining on a domain specific corpus, the model is capable of overcoming deficiencies of both supervised and unsupervised state-of-the-art approaches to keyword extraction.
arXiv Detail & Related papers (2020-03-20T09:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.