Related papers: Quantifying Return on Security Controls in LLM Systems

Quantifying Return on Security Controls in LLM Systems

URL: http://arxiv.org/abs/2512.15081v1
Date: Wed, 17 Dec 2025 04:58:09 GMT
Title: Quantifying Return on Security Controls in LLM Systems
Authors: Richard Helder Moulton, Austin O'Brien, John D. Hastings,
Abstract summary: This paper introduces a decision-oriented framework to quantify residual risk.<n>It converts adversarial probe outcomes into financial risk estimates and return-on-control metrics.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although large language models (LLMs) are increasingly used in security-critical workflows, practitioners lack quantitative guidance on which safeguards are worth deploying. This paper introduces a decision-oriented framework and reproducible methodology that together quantify residual risk, convert adversarial probe outcomes into financial risk estimates and return-on-control (RoC) metrics, and enable monetary comparison of layered defenses for LLM-based systems. A retrieval-augmented generation (RAG) service is instantiated using the DeepSeek-R1 model over a corpus containing synthetic personally identifiable information (PII), and subjected to automated attacks with Garak across five vulnerability classes: PII leakage, latent context injection, prompt injection, adversarial attack generation, and divergence. For each (vulnerability, control) pair, attack success probabilities are estimated via Laplace's Rule of Succession and combined with loss triangle distributions, calibrated from public breach-cost data, in 10,000-run Monte Carlo simulations to produce loss exceedance curves and expected losses. Three widely used mitigations, attribute-based access control (ABAC); named entity recognition (NER) redaction using Microsoft Presidio; and NeMo Guardrails, are then compared to a baseline RAG configuration. The baseline system exhibits very high attack success rates (>= 0.98 for PII, latent injection, and prompt injection), yielding a total simulated expected loss of $313k per attack scenario. ABAC collapses success probabilities for PII and prompt-related attacks to near zero and reduces the total expected loss by ~94%, achieving an RoC of 9.83. NER redaction likewise eliminates PII leakage and attains an RoC of 5.97, while NeMo Guardrails provides only marginal benefit (RoC of 0.05).

Related papers

ThreatFormer-IDS: Robust Transformer Intrusion Detection with Zero-Day Generalization and Explainable Attribution [0.0]
Intrusion detection in IoT and industrial networks requires models that can detect rare attacks at low false-positive rates while remaining reliable under evolving traffic and limited labels.<n>We propose ThreatFormer- IDS, a Transformer-based sequence modeling framework that converts flow records into time-ordered windows and learns contextual representations for robust intrusion screening.<n>On the ToN IoT benchmark with chronological evaluation, ThreatFormer-IDS achieves AUCROC 0.994, AUC-PR 0.956, and Recall@1%FPR 0.910, outperforming strong tree-based and sequence baselines.
arXiv Detail & Related papers (2026-02-26T23:20:42Z)
BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning [73.46118996284888]
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
arXiv Detail & Related papers (2026-02-19T08:31:16Z)
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling [50.872910438715486]
Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting.<n>We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling.
arXiv Detail & Related papers (2026-01-30T06:54:35Z)
Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems [14.028345839891855]
Retrieval-augmented generation (RAG) systems integrate document retrieval with large language models.<n>RAG introduces a new privacy risk: adversaries can issue carefully crafted queries to exfiltrate sensitive content gradually.<n>We introduce RAGCRAWLER, which builds a knowledge graph to represent revealed information, and plans queries in semantic space that target unretrieved regions.
arXiv Detail & Related papers (2026-01-22T05:59:42Z)
Stablecoin Design with Adversarial-Robust Multi-Agent Systems via Trust-Weighted Signal Aggregation [5.151910664667141]
We present MVF-Composer, a trust-weighted Mean-Variance Frontier reserve controller incorporating a novel Stress Harness for risk-state estimation.<n>Our key insight is deploying multi-agent simulations as adversarial stress-testers, exposing reserve vulnerabilities before they manifest on-chain.<n>Across 1,200 randomized scenarios with injected Black-Swan, MVF-Composer reduces peak peg deviation by 57% and mean recovery time by 3.1x relative to SAS baselines.
arXiv Detail & Related papers (2026-01-18T14:21:25Z)
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral [59.14787085809595]
We identify Lazy Likelihood Displacement (LLD) as the core mechanism driving this failure.<n>LLD emerges early and triggers a self-reinforcing LLD Death Spiral, where declining likelihood leads to low-confidence responses.<n>We propose a lightweight likelihood-preserving regularization LLDS for GRPO that activates only when a trajectory's likelihood decreases.
arXiv Detail & Related papers (2025-12-03T19:41:15Z)
Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems [11.812488957698038]
Large language models (LLMs) are reshaping numerous facets of our daily lives, leading widespread adoption as web-based services.<n>Retrieval-Augmented Generation (RAG) has emerged as a promising direction by generating responses grounded in external knowledge sources.<n>Recent studies demonstrate the vulnerability of RAG, such as knowledge corruption attacks by injecting misleading information.<n>In this work, we introduce RAGDefender, a resource-efficient defense mechanism against knowledge corruption.
arXiv Detail & Related papers (2025-11-03T06:39:58Z)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z)
Risk-Calibrated Bayesian Streaming Intrusion Detection with SRE-Aligned Decisions [0.0]
We present a risk-calibrated approach to streaming intrusion detection that couples Bayesian Online Changepoint Detection with decision thresholds aligned to Site Reliability Engineering (SRE) error budgets.<n>We detail the hazard model, conjugate updates, and an O(1)-per-event implementation.<n>A concrete SRE example shows how a 99.9% availability SLO (43.2 minutes per month error budget) yields a probability threshold near 0.91 when missed incidents are 10x more costly than false alarms.
arXiv Detail & Related papers (2025-09-17T17:36:08Z)
A Biosecurity Agent for Lifecycle LLM Biosecurity Alignment [13.707244322057834]
This study shows a Biosecurity Agent that comprises four coordinated modes across the model lifecycle.<n>For dataset sanitization (Mode 1), evaluation is conducted on CORD-19, a COVID-19 Open Research dataset of coronavirus-related articles.<n>For preference alignment (Mode 2), DPO with LoRA adapters internalizes refusals and safe completions, reducing end-to-end attack success rate (ASR) from 59.7% to 3.0%.<n>At inference (Mode 3), run-time guardrails across L1-L3 show the expected security-usability trade-off.
arXiv Detail & Related papers (2025-09-13T23:54:54Z)
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks [0.0]
Large Language Models (LLMs) are increasingly employed as evaluators (LLM-as-a-Judge) for assessing the quality of machine-generated text.<n>This paper investigates the vulnerability of LLM-as-a-Judge architectures to prompt-injection attacks.
arXiv Detail & Related papers (2025-05-19T16:51:12Z)
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities [49.09703018511403]
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks.<n>Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system.<n>We propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights.
arXiv Detail & Related papers (2025-02-03T18:59:16Z)
G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing. FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity. We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.