Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
- URL: http://arxiv.org/abs/2407.11438v2
- Date: Sat, 20 Jul 2024 00:47:32 GMT
- Title: Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
- Authors: Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi,
- Abstract summary: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy.
We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models.
- Score: 40.57348900292574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate privacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.
Related papers
- Smoke Screens and Scapegoats: The Reality of General Data Protection Regulation Compliance -- Privacy and Ethics in the Case of Replika AI [1.325665193924634]
This paper takes a critical approach towards examining the intricacies of these issues within AI companion services.
We analyze articles from public media about the company and its practices to gain insight into the trustworthiness of information provided in the policy.
The results reveal despite privacy notices, data collection practices might harvest personal data without users' full awareness.
arXiv Detail & Related papers (2024-11-07T07:36:19Z) - Identifying Privacy Personas [27.301741710016223]
Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection.
While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes.
We propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire.
arXiv Detail & Related papers (2024-10-17T20:49:46Z) - MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions.
We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users.
Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z) - On the Reliability of Large Language Models to Misinformed and Demographically-Informed Prompts [20.84000437261526]
We investigate and observe Large Language Model (LLM)-backed chatbots in addressing misinformed prompts and questions with demographic information.
quantitative analysis using True/False questions reveals that these chatbots can be relied on to give the right answers to these close-ended questions.
qualitative insights, gathered from domain experts, shows that there are still concerns regarding privacy, ethical implications.
arXiv Detail & Related papers (2024-10-06T07:40:11Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Embedding Privacy in Computational Social Science and Artificial Intelligence Research [2.048226951354646]
Preserving privacy has emerged as a critical factor in research.
The increasing use of advanced computational models stands to exacerbate privacy concerns.
This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face.
arXiv Detail & Related papers (2024-04-17T16:07:53Z) - User Privacy Harms and Risks in Conversational AI: A Proposed Framework [1.8416014644193066]
This study identifies 9 privacy harms and 9 privacy risks in text-based interactions.
The aim is to offer developers, policymakers, and researchers a tool for responsible and secure implementation of conversational AI.
arXiv Detail & Related papers (2024-02-15T05:21:58Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.