Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
- URL: http://arxiv.org/abs/2407.11438v2
- Date: Sat, 20 Jul 2024 00:47:32 GMT
- Title: Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
- Authors: Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi,
- Abstract summary: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy.
We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models.
- Score: 40.57348900292574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate privacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.
Related papers
- Banal Deception Human-AI Ecosystems: A Study of People's Perceptions of LLM-generated Deceptive Behaviour [11.285775969393566]
Large language models (LLMs) can provide users with false, inaccurate, or misleading information.
We investigate peoples' perceptions of ChatGPT-generated deceptive behaviour.
arXiv Detail & Related papers (2024-06-12T16:36:06Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Embedding Privacy in Computational Social Science and Artificial Intelligence Research [2.048226951354646]
Preserving privacy has emerged as a critical factor in research.
The increasing use of advanced computational models stands to exacerbate privacy concerns.
This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face.
arXiv Detail & Related papers (2024-04-17T16:07:53Z) - User Privacy Harms and Risks in Conversational AI: A Proposed Framework [1.8416014644193066]
This study identifies 9 privacy harms and 9 privacy risks in text-based interactions.
The aim is to offer developers, policymakers, and researchers a tool for responsible and secure implementation of conversational AI.
arXiv Detail & Related papers (2024-02-15T05:21:58Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - Are Chatbots Ready for Privacy-Sensitive Applications? An Investigation
into Input Regurgitation and Prompt-Induced Sanitization [4.01610127647615]
ChatGPT would retain personally identifiable information (PII) verbatim in 57.4% of cases.
We probe ChatGPT's perception of privacy-related policies and mechanisms by directly instructing it to provide compliant outputs.
arXiv Detail & Related papers (2023-05-24T10:48:05Z) - Knowledge-Grounded Conversational Data Augmentation with Generative
Conversational Networks [76.11480953550013]
We take a step towards automatically generating conversational data using Generative Conversational Networks.
We evaluate our approach on conversations with and without knowledge on the Topical Chat dataset.
arXiv Detail & Related papers (2022-07-22T22:37:14Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - Privacy-accuracy trade-offs in noisy digital exposure notifications [3.04585143845864]
There is interest in using the power of mobile phones to automate the contact-tracing process.
The rough idea is simple: use Bluetooth or other data-exchange technologies to record contacts between users, enable users to report positive diagnoses, and alert users who have been exposed to sick users.
Although designing practical protocols is of crucial importance, it is essential to realize that notifying users about exposure events may itself leak confidential information.
arXiv Detail & Related papers (2020-11-08T15:00:38Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.