Related papers: OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting

OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting

URL: http://arxiv.org/abs/2503.03108v4
Date: Tue, 22 Jul 2025 07:40:20 GMT
Title: OMNISEC: LLM-Driven Provenance-based Intrusion Detection via Retrieval-Augmented Behavior Prompting
Authors: Wenrui Cheng, Tiantian Zhu, Shunan Jing, Jian-Ping Mei, Mingjun Ma, Jiaobo Jin, Zhengqiu Weng,
Abstract summary: Provenance-based Intrusion Detection Systems (PIDSes) have been widely used for endpoint threat analysis.<n>Due to the evolution of attack techniques, rules cannot dynamically model all the characteristics of attackers.<n>Anomaly-based detection systems face a massive false positive problem because they cannot distinguish between changes in normal behavior and real attack behavior.
Score: 4.71781133841068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Provenance-based Intrusion Detection Systems (PIDSes) have been widely used for endpoint threat analysis. These studies can be broadly categorized into rule-based detection systems and learning-based detection systems. Among these, due to the evolution of attack techniques, rules cannot dynamically model all the characteristics of attackers. As a result, such systems often face false negatives. Learning-based detection systems are further divided into supervised learning and anomaly detection. The scarcity of attack samples hinders the usability and effectiveness of supervised learning-based detection systems in practical applications. Anomaly-based detection systems face a massive false positive problem because they cannot distinguish between changes in normal behavior and real attack behavior. The alert results of detection systems are closely related to the manual labor costs of subsequent security analysts. To reduce manual analysis time, we propose OMNISEC, which applies large language models (LLMs) to anomaly-based intrusion detection systems via retrieval-augmented behavior prompting. OMNISEC can identify abnormal nodes and corresponding abnormal events by constructing suspicious nodes and rare paths. By combining two external knowledge bases, OMNISEC uses Retrieval Augmented Generation (RAG) to enable the LLM to determine whether abnormal behavior is a real attack. Finally, OMNISEC can reconstruct the attack graph and restore the complete attack behavior chain of the attacker's intrusion. Experimental results show that OMNISEC outperforms state-of-the-art methods on public benchmark datasets.

Related papers

Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem.<n>These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem.<n>We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z)
Optimized IoT Intrusion Detection using Machine Learning Technique [0.0]
Intrusion detection systems (IDSs) are essential for defending against a variety of attacks.<n>The functional and physical diversity of IoT IDS systems causes significant issues.<n>For peculiarity-based IDS, this study proposes and implements a novel component selection and extraction strategy.
arXiv Detail & Related papers (2024-12-03T21:23:54Z)
Neuro-mimetic Task-free Unsupervised Online Learning with Continual Self-Organizing Maps [56.827895559823126]
Self-organizing map (SOM) is a neural model often used in clustering and dimensionality reduction. We propose a generalization of the SOM, the continual SOM, which is capable of online unsupervised learning under a low memory budget. Our results, on benchmarks including MNIST, Kuzushiji-MNIST, and Fashion-MNIST, show almost a two times increase in accuracy.
arXiv Detail & Related papers (2024-02-19T19:11:22Z)
InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules.<n>Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge.<n>A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z)
Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification. We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations. Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z)
A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks [2.686686221415684]
Intrusion detection is a traditional practice of security experts, however, there are several issues which still need to be tackled. We present an architecture for a hybrid Intrusion Detection System (IDS) for an adaptive and incremental detection of both known and unknown attacks.
arXiv Detail & Related papers (2023-10-26T14:37:54Z)
Performance evaluation of Machine learning algorithms for Intrusion Detection System [0.40964539027092917]
This paper focuses on intrusion detection systems (IDSs) analysis using Machine Learning (ML) techniques. We analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models.
arXiv Detail & Related papers (2023-10-01T06:35:37Z)
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) [0.09208007322096533]
We present HuntGPT, a specialized intrusion detection dashboard applying a Random Forest classifier. The paper delves into the system's architecture, components, and technical accuracy, assessed through Certified Information Security Manager (CISM) Practice Exams. The results demonstrate that conversational agents, supported by LLM and integrated with XAI, provide robust, explainable, and actionable AI solutions in intrusion detection.
arXiv Detail & Related papers (2023-09-27T20:58:13Z)
Robust Recommender System: A Survey and Future Directions [58.87305602959857]
We first present a taxonomy to organize current techniques for withstanding malicious attacks and natural noise. We then explore state-of-the-art methods in each category, including fraudster detection, adversarial training, certifiable robust training for defending against malicious attacks. We discuss robustness across varying recommendation scenarios and its interplay with other properties like accuracy, interpretability, privacy, and fairness.
arXiv Detail & Related papers (2023-09-05T08:58:46Z)
Intrusion Detection: A Deep Learning Approach [0.0]
The paper proposes a novel architecture to combat intrusion detection that has a Convolutional Neural Network (CNN) module, along with a Long Short Term Memory(LSTM) module and a Support Vector Machine (SVM) classification function. The analysis is followed by a comparison of both conventional machine learning techniques and deep learning methodologies, which highlights areas that could be further explored.
arXiv Detail & Related papers (2023-06-13T07:58:40Z)
Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications. It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data. We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Towards an Awareness of Time Series Anomaly Detection Models' Adversarial Vulnerability [21.98595908296989]
We demonstrate that the performance of state-of-the-art anomaly detection methods is degraded substantially by adding only small adversarial perturbations to the sensor data. We use different scoring metrics such as prediction errors, anomaly, and classification scores over several public and private datasets. We demonstrate, for the first time, the vulnerabilities of anomaly detection systems against adversarial attacks.
arXiv Detail & Related papers (2022-08-24T01:55:50Z)
Semantic Novelty Detection via Relational Reasoning [17.660958043781154]
We propose a novel representation learning paradigm based on relational reasoning. Our experiments show that this knowledge is directly transferable to a wide range of scenarios. It can be exploited as a plug-and-play module to convert closed-set recognition models into reliable open-set ones.
arXiv Detail & Related papers (2022-07-18T15:49:27Z)
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z)
Deep Learning and Traffic Classification: Lessons learned from a commercial-grade dataset with hundreds of encrypted and zero-day applications [72.02908263225919]
We share our experience on a commercial-grade DL traffic classification engine. We identify known applications from encrypted traffic, as well as unknown zero-day applications. We propose a novel technique, tailored for DL models, that is significantly more accurate and light-weight than the state of the art.
arXiv Detail & Related papers (2021-04-07T15:21:22Z)
No Need to Know Physics: Resilience of Process-based Model-free Anomaly Detection for Industrial Control Systems [95.54151664013011]
We present a novel framework to generate adversarial spoofing signals that violate physical properties of the system. We analyze four anomaly detectors published at top security conferences.
arXiv Detail & Related papers (2020-12-07T11:02:44Z)
A Novel Anomaly Detection Algorithm for Hybrid Production Systems based on Deep Learning and Timed Automata [73.38551379469533]
DAD:DeepAnomalyDetection is a new approach for automatic model learning and anomaly detection in hybrid production systems. It combines deep learning and timed automata for creating behavioral model from observations. The algorithm has been applied to few data sets including two from real systems and has shown promising results.
arXiv Detail & Related papers (2020-10-29T08:27:43Z)
Experimental Review of Neural-based approaches for Network Intrusion Management [8.727349339883094]
We provide an experimental-based review of neural-based methods applied to intrusion detection issues. We offer a complete view of the most prominent neural-based techniques relevant to intrusion detection, including deep-based approaches or weightless neural networks. Our evaluation quantifies the value of neural networks, particularly when state-of-the-art datasets are used to train the models.
arXiv Detail & Related papers (2020-09-18T18:32:24Z)
Machine Learning Applications in Misuse and Anomaly Detection [0.0]
Machine learning and data mining algorithms play important roles in designing intrusion detection systems. Based on their approaches toward the detection of attacks in a network, intrusion detection systems can be broadly categorized into two types. In the misuse detection systems, an attack in a system is detected whenever the sequence of activities in the network matches with a known attack signature. In the anomaly detection approach, on the other hand, anomalous states in a system are identified based on a significant difference in the state transitions of the system from its normal states.
arXiv Detail & Related papers (2020-09-10T19:52:00Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security. In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.