Risk Assessment and Security Analysis of Large Language Models
- URL: http://arxiv.org/abs/2508.17329v1
- Date: Sun, 24 Aug 2025 12:34:34 GMT
- Title: Risk Assessment and Security Analysis of Large Language Models
- Authors: Xiaoyan Zhang, Dongyang Lyu, Xiaoqi Li,
- Abstract summary: This paper focuses on the security problems of large language models (LLMs) in critical application scenarios.<n>We describe the design of a system for dynamic risk assessment and a hierarchical defence system.<n>The experimental results show that the system is capable of identifying concealed attacks.
- Score: 3.571310580820494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) expose systemic security challenges in high risk applications, including privacy leaks, bias amplification, and malicious abuse, there is an urgent need for a dynamic risk assessment and collaborative defence framework that covers their entire life cycle. This paper focuses on the security problems of large language models (LLMs) in critical application scenarios, such as the possibility of disclosure of user data, the deliberate input of harmful instructions, or the models bias. To solve these problems, we describe the design of a system for dynamic risk assessment and a hierarchical defence system that allows different levels of protection to cooperate. This paper presents a risk assessment system capable of evaluating both static and dynamic indicators simultaneously. It uses entropy weighting to calculate essential data, such as the frequency of sensitive words, whether the API call is typical, the realtime risk entropy value is significant, and the degree of context deviation. The experimental results show that the system is capable of identifying concealed attacks, such as role escape, and can perform rapid risk evaluation. The paper uses a hybrid model called BERT-CRF (Bidirectional Encoder Representation from Transformers) at the input layer to identify and filter malicious commands. The model layer uses dynamic adversarial training and differential privacy noise injection technology together. The output layer also has a neural watermarking system that can track the source of the content. In practice, the quality of this method, especially important in terms of customer service in the financial industry.
Related papers
- Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation [50.87199039334856]
Retrieval-Augmented Generation (RAG) has become a cornerstone of knowledge-intensive applications.<n>Recent studies show that knowledge-extraction attacks can recover sensitive knowledge-base content through maliciously crafted queries.<n>We introduce the first systematic benchmark for knowledge-extraction attacks on RAG systems.
arXiv Detail & Related papers (2026-02-10T01:27:46Z) - Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z) - The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents [37.75212140218036]
We formalize the Loss-of-Control risk and identify the previously underexamined Intrinsic Value Misalignment (Intrinsic VM)<n>We then introduce IMPRESS, a scenario-driven framework for systematically assessing this risk.<n>We evaluate Intrinsic VM on 21 state-of-the-art LLM agents and find that it is a common and broadly observed safety risk across models.
arXiv Detail & Related papers (2026-01-24T07:09:50Z) - Toward Automated Security Risk Detection in Large Software Using Call Graph Analysis [0.30586855806896035]
This paper investigates the automation of software threat modeling through the clustering of call graphs using density-based and community detection algorithms.<n>The proposed method was evaluated through a case study of the Splunk Forwarder Operator (SFO), wherein selected clustering metrics were applied to the software's call graph to assess pertinent code-density security weaknesses.
arXiv Detail & Related papers (2025-10-30T15:43:59Z) - RAG Security and Privacy: Formalizing the Threat Model and Attack Surface [4.823988025629304]
Retrieval-Augmented Generation (RAG) is an emerging approach in natural language processing that combines large language models (LLMs) with external document retrieval to produce more accurate and grounded responses.<n>Existing research has demonstrated that RAGs can leak sensitive information through training data memorization or adversarial prompts, and RAG systems inherit many of these vulnerabilities.<n>Despite these risks, there is currently no formal framework that defines the threat landscape for RAG systems.
arXiv Detail & Related papers (2025-09-24T17:11:35Z) - Alleviating Attack Data Scarcity: SCANIA's Experience Towards Enhancing In-Vehicle Cyber Security Measures [0.1631115063641726]
This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log.<n>It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios.<n>We develop and perform an empirical evaluation of two deep neural network IDS models using the generated data.
arXiv Detail & Related papers (2025-07-03T13:31:33Z) - Efficient Cybersecurity Assessment Using SVM and Fuzzy Evidential Reasoning for Resilient Infrastructure [0.0]
This paper proposes an assessment model for security issues using fuzzy evidential reasoning (ER) approaches.<n>To overcome with such complications, this paper proposes an assessment model for security issues using fuzzy evidential reasoning (ER) approaches.
arXiv Detail & Related papers (2025-06-28T16:08:34Z) - DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents [52.92354372596197]
Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities.<n>This interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior.<n>We propose a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control and data-level constraints.
arXiv Detail & Related papers (2025-06-13T05:01:09Z) - Real-Time Detection of Insider Threats Using Behavioral Analytics and Deep Evidential Clustering [0.0]
We propose a novel framework for real-time detection of insider threats using behavioral analytics combined with deep evidential clustering.<n>Our system captures and analyzes user activities, applies context-rich behavioral features, and classifies potential threats.<n>We evaluate our framework on benchmark insider threat datasets such as CERT and TWOS, achieving an average detection accuracy of 94.7% and a 38% reduction in false positives compared to traditional clustering methods.
arXiv Detail & Related papers (2025-05-21T11:21:33Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Intermediate Outputs Are More Sensitive Than You Think [3.20091188522012]
This paper introduces a novel approach to measuring privacy risks in deep computer vision models based on the Degrees of Freedom (DoF) and sensitivity of intermediate outputs.<n>We propose a framework that leverages DoF to evaluate the amount of information retained in each layer and combines this with the rank of the Jacobian matrix to assess sensitivity to input variations.
arXiv Detail & Related papers (2024-12-01T06:40:28Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.