Related papers: Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence

Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence

URL: http://arxiv.org/abs/2207.00232v1
Date: Fri, 1 Jul 2022 06:55:12 GMT
Title: Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence
Authors: Peipei Liu, Hong Li, Zuoguang Wang, Jie Liu, Yimo Ren, Hongsong Zhu
Abstract summary: We propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens. In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method. We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.
Score: 7.321994923276344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extracting cybersecurity entities such as attackers and vulnerabilities from unstructured network texts is an important part of security analysis. However, the sparsity of intelligence data resulted from the higher frequency variations and the randomness of cybersecurity entity names makes it difficult for current methods to perform well in extracting security-related concepts and entities. To this end, we propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens to detect and classify the cybersecurity names over unstructured text. In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method. More than that, a token gets augmented semantic information from its most similar K words in cybersecurity domain corpus where an attentive module is leveraged to weigh differences of the words, and from contextual clues based on a large-scale general field corpus. We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.

Related papers

Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem. These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem. We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z)
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats. Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction. We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts. We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z)
LSTM Recurrent Neural Networks for Cybersecurity Named Entity Recognition [1.411911111800469]
The model demonstrated in this paper is domain independent and does not rely on any features specific to the entities in the cybersecurity domain. The results we obtained showed that this method outperforms the state of the art methods given an annotated corpus of a decent size.
arXiv Detail & Related papers (2024-08-30T08:35:48Z)
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages [0.0]
This research study provides a novel strategy for improving security measures and denoising multiple languages. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data.
arXiv Detail & Related papers (2023-07-06T04:10:40Z)
Blockchain-aided Secure Semantic Communication for AI-Generated Content in Metaverse [59.04428659123127]
We propose a blockchain-aided semantic communication framework for AIGC services in virtual transportation networks. We illustrate a training-based semantic attack scheme to generate adversarial semantic data by various loss functions. We also design a semantic defense scheme that uses the blockchain and zero-knowledge proofs to tell the difference between the semantic similarities of adversarial and authentic semantic data.
arXiv Detail & Related papers (2023-01-25T02:32:02Z)
Neuro-Symbolic Artificial Intelligence (AI) for Intent based Semantic Communication [85.06664206117088]
6G networks must consider semantics and effectiveness (at end-user) of the data transmission. NeSy AI is proposed as a pillar for learning causal structure behind the observed data. GFlowNet is leveraged for the first time in a wireless system to learn the probabilistic structure which generates the data.
arXiv Detail & Related papers (2022-05-22T07:11:57Z)
A Deep Learning Approach for Ontology Enrichment from Unstructured Text [2.932750332087746]
Existing information vulnerabilities on attacks, controls, and advisories available on the web provide an opportunity to represent and perform security analytics. Ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases, and sentences. Bidirectional LSTMs trained on a large DB dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence is deployed to enrich ISO-based information security.
arXiv Detail & Related papers (2021-12-16T01:32:21Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
OntoEnricher: A Deep Learning Approach for Ontology Enrichment from Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns. This necessitates dynamic and automated enrichment of information security. Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z)
Named Entity Recognition for Social Media Texts with Semantic Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts. We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Security [5.180648702293017]
Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data. A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper. The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset.
arXiv Detail & Related papers (2020-03-31T00:36:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.