Related papers: Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

URL: http://arxiv.org/abs/2407.11438v2
Date: Sat, 20 Jul 2024 00:47:32 GMT
Title: Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Authors: Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi,
Abstract summary: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy. We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models.
Score: 40.57348900292574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate privacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.

Related papers

Understanding Users' Privacy Reasoning and Behaviors During Chatbot Use to Support Meaningful Agency in Privacy [0.1390311627586184]
We examined students' in-the-moment disclosure and protection behaviors, as well as the reasoning underlying these behaviors.<n>Participants used a simulated ChatGPT interface with and without a privacy notice panel that intercepts message submissions.<n>We analyzed how the panel fostered privacy awareness, encouraged protective actions, and supported context-specific reasoning about what information to protect and how.
arXiv Detail & Related papers (2026-01-26T04:13:45Z)
Understanding Privacy Norms Around LLM-Based Chatbots: A Contextual Integrity Perspective [14.179623604712065]
We conduct a survey experiment with 300 US ChatGPT users to understand emerging privacy norms for sharing ChatGPT data.<n>Our findings reveal a stark disconnect between user concerns and behavior.<n>Participants uniformly rejected sharing personal data for improved services, even in exchange for premium features worth $200.
arXiv Detail & Related papers (2025-08-09T00:22:46Z)
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data [76.21047984886273]
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. Due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users. We introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities.
arXiv Detail & Related papers (2025-02-28T00:43:35Z)
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents [33.26308626066122]
We characterize the notion of contextual privacy for user interactions with Conversational Agents (LCAs)<n>It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals.<n>We propose a locally deployable framework that operates between users and LCAs, identifying and reformulating out-of-context information in user prompts.
arXiv Detail & Related papers (2025-02-22T09:05:39Z)
REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues. We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues. Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z)
Smoke Screens and Scapegoats: The Reality of General Data Protection Regulation Compliance -- Privacy and Ethics in the Case of Replika AI [1.325665193924634]
This paper takes a critical approach towards examining the intricacies of these issues within AI companion services. We analyze articles from public media about the company and its practices to gain insight into the trustworthiness of information provided in the policy. The results reveal despite privacy notices, data collection practices might harvest personal data without users' full awareness.
arXiv Detail & Related papers (2024-11-07T07:36:19Z)
Identifying Privacy Personas [27.301741710016223]
Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection. While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes. We propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire.
arXiv Detail & Related papers (2024-10-17T20:49:46Z)
MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions. We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users. Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z)
On the Reliability of Large Language Models to Misinformed and Demographically-Informed Prompts [20.84000437261526]
We investigate and observe Large Language Model (LLM)-backed chatbots in addressing misinformed prompts and questions with demographic information. quantitative analysis using True/False questions reveals that these chatbots can be relied on to give the right answers to these close-ended questions. qualitative insights, gathered from domain experts, shows that there are still concerns regarding privacy, ethical implications.
arXiv Detail & Related papers (2024-10-06T07:40:11Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans. We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
Embedding Privacy in Computational Social Science and Artificial Intelligence Research [2.048226951354646]
Preserving privacy has emerged as a critical factor in research. The increasing use of advanced computational models stands to exacerbate privacy concerns. This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face.
arXiv Detail & Related papers (2024-04-17T16:07:53Z)
User Privacy Harms and Risks in Conversational AI: A Proposed Framework [1.8416014644193066]
This study identifies 9 privacy harms and 9 privacy risks in text-based interactions. The aim is to offer developers, policymakers, and researchers a tool for responsible and secure implementation of conversational AI.
arXiv Detail & Related papers (2024-02-15T05:21:58Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.