Related papers: Disclosure Audits for LLM Agents

Disclosure Audits for LLM Agents

URL: http://arxiv.org/abs/2506.10171v2
Date: Sat, 14 Jun 2025 01:16:24 GMT
Title: Disclosure Audits for LLM Agents
Authors: Saswat Das, Jameson Sandler, Ferdinando Fioretto,
Abstract summary: Large Language Model agents have begun to appear as personal assistants, customer service bots, and clinical aides.<n>This study proposes an auditing framework for conversational privacy that quantifies and audits these risks.
Score: 44.27620230177312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model agents have begun to appear as personal assistants, customer service bots, and clinical aides. While these applications deliver substantial operational benefits, they also require continuous access to sensitive data, which increases the likelihood of unauthorized disclosures. This study proposes an auditing framework for conversational privacy that quantifies and audits these risks. The proposed Conversational Manipulation for Privacy Leakage (CMPL) framework, is an iterative probing strategy designed to stress-test agents that enforce strict privacy directives. Rather than focusing solely on a single disclosure event, CMPL simulates realistic multi-turn interactions to systematically uncover latent vulnerabilities. Our evaluation on diverse domains, data modalities, and safety configurations demonstrate the auditing framework's ability to reveal privacy risks that are not deterred by existing single-turn defenses. In addition to introducing CMPL as a diagnostic tool, the paper delivers (1) an auditing procedure grounded in quantifiable risk metrics and (2) an open benchmark for evaluation of conversational privacy across agent implementations.

Related papers

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
Invisible Tokens, Visible Bills: The Urgent Need to Audit Hidden Operations in Opaque LLM Services [22.700907666937177]
This position paper highlights emerging accountability challenges in commercial Opaque LLM Services (COLS)<n>We formalize two key risks: textitquantity inflation, where token and call counts may be artificially inflated, and textitquality downgrade, where providers might quietly substitute lower-cost models or tools.<n>We propose a modular three-layer auditing framework for COLS and users that enables trustworthy verification across execution, secure logging, and user-facing auditability without exposing proprietary internals.
arXiv Detail & Related papers (2025-05-24T02:26:49Z)
A Survey on Privacy Risks and Protection in Large Language Models [13.602836059584682]
Large Language Models (LLMs) have become increasingly integral to diverse applications, raising privacy concerns.<n>This survey offers a comprehensive overview of privacy risks associated with LLMs and examines current solutions to mitigate these challenges.
arXiv Detail & Related papers (2025-05-04T03:04:07Z)
AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence [54.317522790545304]
We present AgentOrca, a dual-system framework for evaluating language agents' compliance with operational constraints and routines.<n>Our framework encodes action constraints and routines through both natural language prompts for agents and corresponding executable code serving as ground truth for automated verification.<n>Our findings reveal notable performance gaps among state-of-the-art models, with large reasoning models like o1 demonstrating superior compliance while others show significantly lower performance.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run [7.850976675388593]
We introduce a unifying framework for privacy audits based on information-theoretic principles.<n>We demystify the method of privacy audit by one run, identifying the conditions under which single-run audits are feasible or infeasible.
arXiv Detail & Related papers (2025-01-29T16:38:51Z)
Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions [0.0]
This paper introduces the Privacy-Preserving Zero-Shot Learning (PP-ZSL) framework, a novel approach leveraging large language models (LLMs) in a zero-shot learning mode.<n>Unlike conventional machine learning methods, PP-ZSL eliminates the need for local training on sensitive data by utilizing pre-trained LLMs to generate responses directly.<n>The framework incorporates real-time data anonymization to redact or mask sensitive information, retrieval-augmented generation (RAG) for domain-specific query resolution, and robust post-processing to ensure compliance with regulatory standards.
arXiv Detail & Related papers (2024-12-10T17:20:47Z)
Preemptive Detection and Correction of Misaligned Actions in LLM Agents [70.54226917774933]
InferAct is a novel approach to detect misaligned actions before execution.<n>It alerts users for timely correction, preventing adverse outcomes.<n>InferAct achieves up to 20% improvements on Marco-F1 against baselines in misaligned action detection.
arXiv Detail & Related papers (2024-07-16T15:24:44Z)
Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives [47.17703009473386]
Powerful AI models have led to impressive leaps in performance across a wide range of tasks. Privacy concerns have led to a wealth of literature covering various privacy risks and vulnerabilities of AI models. We conduct a systematic review of these survey papers to provide a concise and usable overview of privacy risks in GPAIS.
arXiv Detail & Related papers (2024-07-02T07:49:48Z)
Noisy Neighbors: Efficient membership inference attacks against LLMs [2.666596421430287]
This paper introduces an efficient methodology that generates textitnoisy neighbors for a target sample by adding noise in the embedding space. Our findings demonstrate that this approach closely matches the effectiveness of employing shadow models, showing its usability in practical privacy auditing scenarios.
arXiv Detail & Related papers (2024-06-24T12:02:20Z)
Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities [1.3225694028747144]
We propose a Coordinated Flaw Disclosure framework tailored to the complexities of machine learning (ML) issues. Our framework introduces innovations such as extended model cards, dynamic scope expansion, an independent adjudication panel, and an automated verification process. We argue that CFD could significantly enhance public trust in AI systems.
arXiv Detail & Related papers (2024-02-10T20:39:04Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight. They only give tight estimates under implausible worst-case assumptions. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z)
Having your Privacy Cake and Eating it Too: Platform-supported Auditing of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse. Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes. We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.