Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective
- URL: http://arxiv.org/abs/2508.06760v1
- Date: Sat, 09 Aug 2025 00:22:46 GMT
- Title: Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective
- Authors: Sarah Tran, Hongfan Lu, Isaac Slaughter, Bernease Herman, Aayushi Dangol, Yue Fu, Lufei Chen, Biniyam Gebreyohannes, Bill Howe, Alexis Hiniker, Nicholas Weber, Robert Wolfe,
- Abstract summary: We conduct a survey experiment with 300 US ChatGPT users to understand emerging privacy norms for sharing ChatGPT data.<n>Our findings reveal a stark disconnect between user concerns and behavior.<n>Participants uniformly rejected sharing personal data for improved services, even in exchange for premium features worth $200.
- Score: 14.179623604712065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-driven chatbots like ChatGPT have created large volumes of conversational data, but little is known about how user privacy expectations are evolving with this technology. We conduct a survey experiment with 300 US ChatGPT users to understand emerging privacy norms for sharing chatbot data. Our findings reveal a stark disconnect between user concerns and behavior: 82% of respondents rated chatbot conversations as sensitive or highly sensitive - more than email or social media posts - but nearly half reported discussing health topics and over one-third discussed personal finances with ChatGPT. Participants expressed strong privacy concerns (t(299) = 8.5, p < .01) and doubted their conversations would remain private (t(299) = -6.9, p < .01). Despite this, respondents uniformly rejected sharing personal data (search history, emails, device access) for improved services, even in exchange for premium features worth $200. To identify which factors influence appropriate chatbot data sharing, we presented participants with factorial vignettes manipulating seven contextual factors. Linear mixed models revealed that only the transmission factors such as informed consent, data anonymization, or the removal of personally identifiable information, significantly affected perceptions of appropriateness and concern for data access. Surprisingly, contextual factors including the recipient of the data (hospital vs. tech company), purpose (research vs. advertising), type of content, and geographic location did not show significant effects. Our results suggest that users apply consistent baseline privacy expectations to chatbot data, prioritizing procedural safeguards over recipient trustworthiness. This has important implications for emerging agentic AI systems that assume user willingness to integrate personal data across platforms.
Related papers
- Understanding Users' Privacy Reasoning and Behaviors During Chatbot Use to Support Meaningful Agency in Privacy [0.1390311627586184]
We examined students' in-the-moment disclosure and protection behaviors, as well as the reasoning underlying these behaviors.<n>Participants used a simulated ChatGPT interface with and without a privacy notice panel that intercepts message submissions.<n>We analyzed how the panel fostered privacy awareness, encouraged protective actions, and supported context-specific reasoning about what information to protect and how.
arXiv Detail & Related papers (2026-01-26T04:13:45Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - Investigating User Perspectives on Differentially Private Text Privatization [81.59631769859004]
This work investigates how factors of $textitscenario$, $textitdata sensitivity$, $textitmechanism type$, and $textitreason for data collection$ impact user preferences for text privatization.<n>We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts.
arXiv Detail & Related papers (2025-03-12T12:33:20Z) - Smoke Screens and Scapegoats: The Reality of General Data Protection Regulation Compliance -- Privacy and Ethics in the Case of Replika AI [1.325665193924634]
This paper takes a critical approach towards examining the intricacies of these issues within AI companion services.
We analyze articles from public media about the company and its practices to gain insight into the trustworthiness of information provided in the policy.
The results reveal despite privacy notices, data collection practices might harvest personal data without users' full awareness.
arXiv Detail & Related papers (2024-11-07T07:36:19Z) - Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild [40.57348900292574]
Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy.
We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models.
arXiv Detail & Related papers (2024-07-16T07:05:31Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - ChatGPT for Us: Preserving Data Privacy in ChatGPT via Dialogue Text
Ambiguation to Expand Mental Health Care Delivery [52.73936514734762]
ChatGPT has gained popularity for its ability to generate human-like dialogue.
Data-sensitive domains face challenges in using ChatGPT due to privacy and data-ownership concerns.
We propose a text ambiguation framework that preserves user privacy.
arXiv Detail & Related papers (2023-05-19T02:09:52Z) - FedBot: Enhancing Privacy in Chatbots with Federated Learning [0.0]
Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location.
The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training.
The system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.
arXiv Detail & Related papers (2023-04-04T23:13:52Z) - Privacy Explanations - A Means to End-User Trust [64.7066037969487]
We looked into how explainability might help to tackle this problem.
We created privacy explanations that aim to help to clarify to end users why and for what purposes specific data is required.
Our findings reveal that privacy explanations can be an important step towards increasing trust in software systems.
arXiv Detail & Related papers (2022-10-18T09:30:37Z) - Privacy Concerns in Chatbot Interactions: When to Trust and When to
Worry [3.867363075280544]
We surveyed a representative sample of 491 British citizens.
Our results show that the user concerns focus on deleting personal information and concerns about their data's inappropriate use.
We also identified that individuals were concerned about losing control over their data after a conversation with conversational agents.
arXiv Detail & Related papers (2021-07-08T16:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.