Related papers: SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots

URL: http://arxiv.org/abs/2510.21459v1
Date: Fri, 24 Oct 2025 13:41:52 GMT
Title: SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots
Authors: Adetayo Adebimpe, Helmut Neukirchen, Thomas Welsh,
Abstract summary: Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems.<n>We propose the System-Based Attention Shell Honeypot framework which manages data-protection issues through the use of lightweight local LLMs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.

Related papers

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks [7.685814179879813]
This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset.<n>We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms.
arXiv Detail & Related papers (2026-02-24T12:32:11Z)
Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems [11.812488957698038]
Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services.<n>Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources.<n>Recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information.<n>In this work, we introduce RAGDefender, a resource-efficient defense mechanism against knowledge corruption.
arXiv Detail & Related papers (2025-11-03T06:39:58Z)
ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z)
REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z)
Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models [1.4999444543328293]
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency.<n>This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small- parameter Large Language Models (LLMs) for phishing detection.<n>We show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues.
arXiv Detail & Related papers (2025-07-10T04:01:52Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [65.18157595903124]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
System Prompt Extraction Attacks and Defenses in Large Language Models [2.6986500640871482]
System prompt in Large Language Models (LLMs) plays a pivotal role in guiding model behavior and response generation.<n>Recent studies have shown that LLM system prompts are highly susceptible to extraction attacks through meticulously designed queries.<n>Despite the growing threat, there is a lack of systematic studies of system prompt extraction attacks and defenses.
arXiv Detail & Related papers (2025-05-27T21:36:27Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
Large language models (LLMs) are becoming more capable and widespread.<n>Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks.<n>In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs.
arXiv Detail & Related papers (2025-02-03T18:59:01Z)
Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting [14.579802892916101]
Large Language Models (LLMs) have recently demonstrated significant potential in time series forecasting.<n>However, their robustness and reliability in real-world applications remain under-explored.<n>We introduce a targeted adversarial attack framework for LLM-based time series forecasting.
arXiv Detail & Related papers (2024-12-11T04:53:15Z)
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs [6.517076600304129]
Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements.<n>We propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline.<n>RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector.
arXiv Detail & Related papers (2024-11-27T10:48:37Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER)<n>LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens.<n>LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z)
ThinkNote: Enhancing Knowledge Integration and Utilization of Large Language Models via Constructivist Cognition Modeling [55.21641515545307]
Large Language Models (LLMs) have demonstrated strong performance across a wide range of NLP tasks.<n>They often exhibit suboptimal behaviors and inconsistencies when exposed to unfamiliar external information.<n>We propose ThinkNote, a novel framework that enhances the external knowledge utilization of LLMs.
arXiv Detail & Related papers (2024-02-21T06:04:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.