A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models
- URL: http://arxiv.org/abs/2511.21758v1
- Date: Mon, 24 Nov 2025 12:40:15 GMT
- Title: A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models
- Authors: Zhen Tao, Shidong Pan, Zhenchang Xing, Emily Black, Talia Gillis, Chunyang Chen,
- Abstract summary: This paper presents the first longitudinal empirical study of privacy policies for mainstream large language model (LLM) providers worldwide.<n>We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025.<n>We extract over 3,000 sentence-level edits between consecutive policy versions and propose a taxonomy tailored to LLM privacy policies.
- Score: 21.528443835703055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising privacy concerns around excessive collection of sensitive personal information. Privacy policies are the fundamental mechanism for informing users about data practices in modern information privacy paradigm. Although traditional web and mobile policies are well studied, the privacy policies of LLM providers, their LLM-specific content, and their evolution over time remain largely underexplored. In this paper, we present the first longitudinal empirical study of privacy policies for mainstream LLM providers worldwide. We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025, and extract over 3,000 sentence-level edits between consecutive policy versions. We compare LLM privacy policies to those of other software formats, propose a taxonomy tailored to LLM privacy policies, annotate policy edits and align them with a timeline of key LLM ecosystem events. Results show they are substantially longer, demand college-level reading ability, and remain highly vague. Our taxonomy analysis reveals patterns in how providers disclose LLM-specific practices and highlights regional disparities in coverage. Policy edits are concentrated in first-party data collection and international/specific-audience sections, and that product releases and regulatory actions are the primary drivers, shedding light on the status quo and the evolution of LLM privacy policies.
Related papers
- When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing [61.80513991207956]
This work focuses on the challenge of how to restore surrogate-driven protected data in diverse MLLM scenarios.<n>We first bridge this research gap by contributing the SPPE (Surrogate Privacy Protected Editable) dataset.<n>We introduce a unified approach that reliably reconstructs private content while preserving the fidelity of MLLM-generated edits.
arXiv Detail & Related papers (2025-12-08T04:59:03Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - Differentially Private Steering for Large Language Model Alignment [55.30573701583768]
We present the first study of aligning Large Language Models with private datasets.<n>Our work proposes the Private Steering for LLM Alignment (PSA) algorithm to edit activations with differential privacy guarantees.<n>Our results show that PSA achieves DP guarantees for LLM alignment with minimal loss in performance.
arXiv Detail & Related papers (2025-01-30T17:58:36Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.
Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs.
Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z) - The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies [58.94148083602662]
Large Language Models (LLMs) agents have evolved to perform complex tasks.<n>The widespread applications of LLM agents demonstrate their significant commercial value.<n>However, they also expose security and privacy vulnerabilities.<n>This survey aims to provide a comprehensive overview of the newly emerged privacy and security issues faced by LLM agents.
arXiv Detail & Related papers (2024-07-28T00:26:24Z) - On Protecting the Data Privacy of Large Language Models (LLMs): A Survey [35.48984524483533]
Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language.
LLMs process and generate large amounts of data, which may threaten data privacy.
arXiv Detail & Related papers (2024-03-08T08:47:48Z) - Privacy Policies Across the Ages: Content and Readability of Privacy
Policies 1996--2021 [1.5229257192293197]
We analyze the 25-year history of privacy policies using methods from transparency research, machine learning, and natural language processing.
We collect a large-scale longitudinal corpus of privacy policies from 1996 to 2021.
Our results show that policies are getting longer and harder to read, especially after new regulations take effect.
arXiv Detail & Related papers (2022-01-21T15:13:02Z) - Privacy Policies over Time: Curation and Analysis of a Million-Document
Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine.
We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites.
Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.