Related papers: More Data Types More Problems: A Temporal Analysis of Complexity, Stability, and Sensitivity in Privacy Policies

Related papers

Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch [101.49955223689268]
We present Privasis, the first million-scale fully synthetic dataset entirely built from scratch.<n>Compared to existing datasets, Privasis offers orders-of-magnitude larger scale with quality.<n>We leverage Privasis to construct a parallel corpus for text sanitization with our pipeline that decomposes texts and applies targeted sanitization.
arXiv Detail & Related papers (2026-02-03T06:54:46Z)
How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z)
Libertas: Privacy-Preserving Collective Computation for Decentralised Personal Data Stores [18.91869691495181]
We introduce a modular architecture, Libertas, to integrate MPC with PDS like Solid. We introduce a paradigm shift from an omniscient' view to individual-based, user-centric view of trust and security.
arXiv Detail & Related papers (2023-09-28T12:07:40Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Security and Privacy on Generative Data in AIGC: A Survey [17.456578314457612]
We review the security and privacy on generative data in AIGC. We reveal the successful experiences of state-of-the-art countermeasures in terms of the foundational properties of privacy, controllability, authenticity, and compliance.
arXiv Detail & Related papers (2023-09-18T02:35:24Z)
Visualising Personal Data Flows: Insights from a Case Study of Booking.com [8.485751288361616]
This paper reports our work on taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows.
arXiv Detail & Related papers (2023-04-19T12:17:46Z)
Certified Data Removal in Sum-Product Networks [78.27542864367821]
Deleting the collected data is often insufficient to guarantee data privacy. UnlearnSPN is an algorithm that removes the influence of single data points from a trained sum-product network.
arXiv Detail & Related papers (2022-10-04T08:22:37Z)
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases. splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z)
Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use [0.4462475518267084]
CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records. Data elements were included based on the usefulness, public request, and privacy implications. Specific field values were suppressed to reduce risk of reidentification and exposure of confidential information.
arXiv Detail & Related papers (2021-01-13T14:24:20Z)
Second layer data governance for permissioned blockchains: the privacy management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths. In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z)
Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine. We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites. Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.