Exploring the Limits of Transfer Learning with Unified Model in the
Cybersecurity Domain
- URL: http://arxiv.org/abs/2302.10346v1
- Date: Mon, 20 Feb 2023 22:21:26 GMT
- Title: Exploring the Limits of Transfer Learning with Unified Model in the
Cybersecurity Domain
- Authors: Kuntal Kumar Pal, Kazuaki Kashihara, Ujjwala Anantheswaran, Kirby C.
Kuznia, Siddhesh Jagtap and Chitta Baral
- Abstract summary: We introduce a generative multi-task model, Unified Text-to-Text Cybersecurity (UTS)
UTS is trained on malware reports, phishing site URLs, programming code constructs, social media data, blogs, news articles, and public forum posts.
We show UTS improves the performance of some cybersecurity datasets.
- Score: 17.225973170682604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increase in cybersecurity vulnerabilities of software systems, the
ways to exploit them are also increasing. Besides these, malware threats,
irregular network interactions, and discussions about exploits in public forums
are also on the rise. To identify these threats faster, to detect potentially
relevant entities from any texts, and to be aware of software vulnerabilities,
automated approaches are necessary. Application of natural language processing
(NLP) techniques in the Cybersecurity domain can help in achieving this.
However, there are challenges such as the diverse nature of texts involved in
the cybersecurity domain, the unavailability of large-scale publicly available
datasets, and the significant cost of hiring subject matter experts for
annotations. One of the solutions is building multi-task models that can be
trained jointly with limited data. In this work, we introduce a generative
multi-task model, Unified Text-to-Text Cybersecurity (UTS), trained on malware
reports, phishing site URLs, programming code constructs, social media data,
blogs, news articles, and public forum posts. We show UTS improves the
performance of some cybersecurity datasets. We also show that with a few
examples, UTS can be adapted to novel unseen tasks and the nature of data
Related papers
- AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset [1.9573380763700712]
We will provide the first dataset on cyber-attack attribution.
Ours offers a rich set of annotations with contextual details, including some that span phrases and sentences.
We conducted extensive experiments and applied NLP techniques to demonstrate the dataset's effectiveness for attack attribution.
arXiv Detail & Related papers (2024-08-09T16:10:35Z) - Large Language Models for Cyber Security: A Systematic Literature Review [14.924782327303765]
We conduct a comprehensive review of the literature on the application of Large Language Models in cybersecurity (LLM4Security)
We observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection.
Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training.
arXiv Detail & Related papers (2024-05-08T02:09:17Z) - SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence [27.550484938124193]
This paper introduces a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities.
We create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites.
The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective.
arXiv Detail & Related papers (2024-05-06T13:17:43Z) - Generative AI for Secure Physical Layer Communications: A Survey [80.0638227807621]
Generative Artificial Intelligence (GAI) stands at the forefront of AI innovation, demonstrating rapid advancement and unparalleled proficiency in generating diverse content.
In this paper, we offer an extensive survey on the various applications of GAI in enhancing security within the physical layer of communication networks.
We delve into the roles of GAI in addressing challenges of physical layer security, focusing on communication confidentiality, authentication, availability, resilience, and integrity.
arXiv Detail & Related papers (2024-02-21T06:22:41Z) - ExTRUST: Reducing Exploit Stockpiles with a Privacy-Preserving Depletion
System for Inter-State Relationships [4.349142920611964]
This paper proposes a privacy-preserving approach that allows multiple state parties to privately compare their stock of vulnerabilities and exploits.
We call our system Extrust and show that it is scalable and can withstand several attack scenarios.
arXiv Detail & Related papers (2023-06-01T12:02:17Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Generating Cyber Threat Intelligence to Discover Potential Security
Threats Using Classification and Topic Modeling [6.0897744845912865]
Cyber Threat Intelligence (CTI) has been represented as one of the proactive and robust mechanisms.
Our goal is to identify and explore relevant CTI from hacker forums by using different supervised and unsupervised learning techniques.
arXiv Detail & Related papers (2021-08-16T02:30:29Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z) - Adversarial Machine Learning Attacks and Defense Methods in the Cyber
Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques.
It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.