Related papers: Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models

Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models

URL: http://arxiv.org/abs/2503.15560v1
Date: Tue, 18 Mar 2025 22:30:17 GMT
Title: Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models
Authors: Prashant Kulkarni, Assaf Namer,
Abstract summary: Large Language Models (LLMs) are increasingly vulnerable to sophisticated multi-turn manipulation attacks.<n>This paper introduces the Temporal Context Awareness framework, a novel defense mechanism designed to address this challenge.<n>Preliminary evaluations on simulated adversarial scenarios demonstrate the framework's potential to identify subtle manipulation patterns.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly vulnerable to sophisticated multi-turn manipulation attacks, where adversaries strategically build context through seemingly benign conversational turns to circumvent safety measures and elicit harmful or unauthorized responses. These attacks exploit the temporal nature of dialogue to evade single-turn detection methods, representing a critical security vulnerability with significant implications for real-world deployments. This paper introduces the Temporal Context Awareness (TCA) framework, a novel defense mechanism designed to address this challenge by continuously analyzing semantic drift, cross-turn intention consistency and evolving conversational patterns. The TCA framework integrates dynamic context embedding analysis, cross-turn consistency verification, and progressive risk scoring to detect and mitigate manipulation attempts effectively. Preliminary evaluations on simulated adversarial scenarios demonstrate the framework's potential to identify subtle manipulation patterns often missed by traditional detection techniques, offering a much-needed layer of security for conversational AI systems. In addition to outlining the design of TCA , we analyze diverse attack vectors and their progression across multi-turn conversation, providing valuable insights into adversarial tactics and their impact on LLM vulnerabilities. Our findings underscore the pressing need for robust, context-aware defenses in conversational AI systems and highlight TCA framework as a promising direction for securing LLMs while preserving their utility in legitimate applications. We make our implementation available to support further research in this emerging area of AI security.

Related papers

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models. It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines. We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z)
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense [90.71884758066042]
Large vision-language models (LVLMs) introduce a unique vulnerability: susceptibility to malicious attacks via visual inputs. We propose ESIII (Embedding Security Instructions Into Images), a novel methodology for transforming the visual space from a source of vulnerability into an active defense mechanism.
arXiv Detail & Related papers (2025-03-14T17:39:45Z)
Robust Intrusion Detection System with Explainable Artificial Intelligence [0.0]
Adversarial input can exploit machine learning (ML) models through standard interfaces.<n> Conventional defenses such as adversarial training are costly in computational terms and often fail to provide real-time detection.<n>We suggest a novel strategy for detecting and mitigating adversarial attacks using eXplainable Artificial Intelligence (XAI)
arXiv Detail & Related papers (2025-03-07T10:31:59Z)
Jailbreaking is (Mostly) Simpler Than You Think [2.7174461714624805]
We introduce the Context Compliance Attack (CCA), a novel, optimization-free method for bypassing AI safety mechanisms.<n>CCA exploits a fundamental architectural vulnerability inherent in many deployed AI systems.
arXiv Detail & Related papers (2025-03-07T09:28:19Z)
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks [13.082370325093242]
We introduce AttackSeqBench, a benchmark to evaluate Large Language Models' (LLMs) capability to understand and reason attack sequences in Cyber Threat Intelligence (CTI) reports.<n>Our benchmark encompasses three distinct Question Answering (QA) tasks, each task focuses on the varying granularity in adversarial behavior.<n>We conduct extensive experiments and analysis with both fast-thinking and slow-thinking LLMs, while highlighting their strengths and limitations in analyzing the sequential patterns in cyber attacks.
arXiv Detail & Related papers (2025-03-05T04:25:21Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
Jailbreaking and Mitigation of Vulnerabilities in Large Language Models [4.564507064383306]
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation. Despite these advancements, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks. This review analyzes the state of research on these vulnerabilities and presents available defense strategies.
arXiv Detail & Related papers (2024-10-20T00:00:56Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks [5.860289498416911]
Large Language Models (LLMs) are swiftly advancing in architecture and capability. As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
arXiv Detail & Related papers (2023-10-16T21:37:24Z)
Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks. We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision. Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.