Related papers: Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents

URL: http://arxiv.org/abs/2510.27275v1
Date: Fri, 31 Oct 2025 08:35:42 GMT
Title: Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents
Authors: Kathrin Grosse, Nico Ebert,
Abstract summary: We surveyed a representative sample of 3,270 UK adults in 2024 using Prolific.<n>A third of these use CA services such as ChatGPT or Gemini at least once a week.<n>A fourth have tried jailbreaking (often out of understandable reasons such as curiosity, fun or information seeking)
Score: 6.196034023307935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent improvement gains in large language models (LLMs) have lead to everyday usage of AI-based Conversational Agents (CAs). At the same time, LLMs are vulnerable to an array of threats, including jailbreaks and, for example, causing remote code execution when fed specific inputs. As a result, users may unintentionally introduce risks, for example, by uploading malicious files or disclosing sensitive information. However, the extent to which such user behaviors occur and thus potentially facilitate exploits remains largely unclear. To shed light on this issue, we surveyed a representative sample of 3,270 UK adults in 2024 using Prolific. A third of these use CA services such as ChatGPT or Gemini at least once a week. Of these ``regular users'', up to a third exhibited behaviors that may enable attacks, and a fourth have tried jailbreaking (often out of understandable reasons such as curiosity, fun or information seeking). Half state that they sanitize data and most participants report not sharing sensitive data. However, few share very sensitive data such as passwords. The majority are unaware that their data can be used to train models and that they can opt-out. Our findings suggest that current academic threat models manifest in the wild, and mitigations or guidelines for the secure usage of CAs should be developed. In areas critical to security and privacy, CAs must be equipped with effective AI guardrails to prevent, for example, revealing sensitive information to curious employees. Vendors need to increase efforts to prevent the entry of sensitive data, and to create transparency with regard to data usage policies and settings.

Related papers

A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios [1.1864532555108382]
It is unclear the extent to which current large language models can provide useful information for complex criminal activity.<n>We evaluate whether models provide actionable assistance beyond information typically available on the web, as assessed by domain experts.<n>We found that (1) current large language models provide minimal actionable information for fraud and cybercrime without the use of advanced jailbreaking techniques.
arXiv Detail & Related papers (2026-02-25T12:01:38Z)
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z)
Evaluating Language Model Reasoning about Confidential Information [95.64687778185703]
We study whether language models exhibit contextual robustness, or the capability to adhere to context-dependent safety specifications.<n>We develop a benchmark (PasswordEval) that measures whether language models can correctly determine when a user request is authorized.<n>We find that current open- and closed-source models struggle with this seemingly simple task, and that, perhaps surprisingly, reasoning capabilities do not generally improve performance.
arXiv Detail & Related papers (2025-08-27T15:39:46Z)
Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models [0.34998703934432673]
We propose the concept of an "LLM gatekeeper", a lightweight, locally run model that filters out sensitive information from user queries before they are sent to the potentially untrustworthy, though highly capable, cloud-based LLM.<n>Through experiments with human subjects, we demonstrate that this dual-model approach introduces minimal overhead while significantly enhancing user privacy, without compromising the quality of LLM responses.
arXiv Detail & Related papers (2025-08-22T19:49:03Z)
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [60.78202583483591]
We introduce OS-Harm, a new benchmark for measuring safety of computer use agents.<n> OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.<n>We evaluate computer use agents based on a range of frontier models and provide insights into their safety.
arXiv Detail & Related papers (2025-06-17T17:59:31Z)
Defeating Prompt Injections by Design [79.00910871948787]
CaMeL is a robust defense that creates a protective system layer around the Large Language Models.<n>To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query.<n>To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows.
arXiv Detail & Related papers (2025-03-24T15:54:10Z)
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents [66.29263282311258]
We introduce a new benchmark AgentDAM that measures if AI web-navigation agents follow the privacy principle of data minimization''<n>Our benchmark simulates realistic web interaction scenarios end-to-end and is adaptable to all existing web navigation agents.
arXiv Detail & Related papers (2025-03-12T19:30:31Z)
Privacy Aware Memory Forensics [3.382960674045592]
Recent surveys indicate that 60% of data breaches are primarily caused by malicious insider threats. In this research, we present a novel solution to detect data leakages by insiders in an organization. Our approach captures the RAM of the insiders device and analyses it for sensitive information leaks from a host system.
arXiv Detail & Related papers (2024-06-13T11:18:49Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering [2.2194815687410627]
We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server.<n>Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction.
arXiv Detail & Related papers (2023-10-24T19:50:01Z)
Backdoor Attack against Speaker Verification [86.43395230456339]
We show that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data. We also demonstrate that existing backdoor attacks cannot be directly adopted in attacking speaker verification.
arXiv Detail & Related papers (2020-10-22T11:10:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.