Related papers: Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference

Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference

URL: http://arxiv.org/abs/2509.12152v1
Date: Mon, 15 Sep 2025 17:17:26 GMT
Title: Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference
Authors: Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster,
Abstract summary: Large Language Models (LLMs) can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage.<n>We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference.<n>Results show that participants struggled to anticipate inference, performing a little better than chance.
Score: 8.063685458567202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) such as ChatGPT can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage. While prior work has demonstrated these risks, little is known about how users estimate and respond. We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference. We compared their rewrites with those generated by ChatGPT and Rescriber, a state-of-the-art sanitization tool. Results show that participants struggled to anticipate inference, performing a little better than chance. User rewrites were effective in just 28\% of cases - better than Rescriber but worse than ChatGPT. We examined our participants' rewriting strategies, and observed that while paraphrasing was the most common strategy it is also the least effective; instead abstraction and adding ambiguity were more successful. Our work highlights the importance of inference-aware design in LLM interactions.

Related papers

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space [11.534994345027362]
Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation.<n>We introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance.<n>We introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder.
arXiv Detail & Related papers (2025-10-28T14:09:05Z)
The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models [13.225041704917905]
This study unveils an attack mechanism that capitalizes on human conversation strategies to extract harmful information from large language models. Unlike conventional methods that target explicit malicious responses, our approach delves deeper into the nature of the information provided in responses.
arXiv Detail & Related papers (2024-07-22T06:04:29Z)
Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A primary concern regarding training large language models (LLMs) is whether they abuse copyrighted online text.<n>We propose an alternative textitinsert-and-detect methodology, advocating that web users and content platforms employ textbftextitunique identifiers for reliable and independent membership inference.
arXiv Detail & Related papers (2024-03-23T06:36:32Z)
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM) Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z)
Intention Analysis Makes LLMs A Good Jailbreak Defender [79.4014719271075]
We present a simple yet highly effective defense strategy, i.e., Intention Analysis ($mathbbIA$)<n>$mathbbIA$ works by triggering LLMs' inherent self-correct and improve ability through a two-stage process.<n>Experiments on varying jailbreak benchmarks show that $mathbbIA$ could consistently and significantly reduce the harmfulness in responses.
arXiv Detail & Related papers (2024-01-12T13:15:05Z)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs) By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation [5.043563227694139]
Large language models (large LMs) are susceptible to producing text that contains hallucinated content. We present a comprehensive investigation into self-contradiction for various instruction-tuned LMs. We propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions.
arXiv Detail & Related papers (2023-05-25T08:43:46Z)
TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.