Differentially Private Synthetic Data Release for Topics API Outputs
- URL: http://arxiv.org/abs/2506.23855v1
- Date: Mon, 30 Jun 2025 13:46:57 GMT
- Title: Differentially Private Synthetic Data Release for Topics API Outputs
- Authors: Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, Peilin Zhong,
- Abstract summary: We focus on one Privacy-Preserving Ads API: the Topics API, part of Google Chrome's Privacy Sandbox.<n>We generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data.<n>We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset.
- Score: 63.79476766779742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The analysis of the privacy properties of Privacy-Preserving Ads APIs is an area of research that has received strong interest from academics, industry, and regulators. Despite this interest, the empirical study of these methods is hindered by the lack of publicly available data. Reliable empirical analysis of the privacy properties of an API, in fact, requires access to a dataset consisting of realistic API outputs; however, privacy concerns prevent the general release of such data to the public. In this work, we develop a novel methodology to construct synthetic API outputs that are simultaneously realistic enough to enable accurate study and provide strong privacy protections. We focus on one Privacy-Preserving Ads APIs: the Topics API, part of Google Chrome's Privacy Sandbox. We developed a methodology to generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data. The use of differential privacy provides strong theoretical bounds on the leakage of private user information from this release. Our methodology is based on first computing a large number of differentially-private statistics describing how output API traces evolve over time. Then, we design a parameterized distribution over sequences of API traces and optimize its parameters so that they closely match the statistics obtained. Finally, we create the synthetic data by drawing from this distribution. Our work is complemented by an open-source release of the anonymized dataset obtained by this methodology. We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset. We believe that this work will contribute to fostering transparency regarding the privacy properties of Privacy-Preserving Ads APIs.
Related papers
- PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs [39.108700932535754]
Private Evolution (PE) algorithm generates Differential Privacy (DP) synthetic images using diffusion model APIs.<n>In practice, the few-shot private data challenge is particularly prevalent in specialized domains like healthcare and industry.<n>We propose a novel API-assisted algorithm, Private Contrastive Evolution (PCEvolve), which iteratively mines inherent inter-class contrastive relationships in few-shot private data.
arXiv Detail & Related papers (2025-06-04T13:33:06Z) - On the Differential Privacy and Interactivity of Privacy Sandbox Reports [78.85958224681858]
The Privacy Sandbox initiative from Google includes APIs for enabling privacy-preserving advertising functionalities.<n>We provide an abstract model for analyzing the privacy of these APIs and show that they satisfy a formal DP guarantee.
arXiv Detail & Related papers (2024-12-22T08:22:57Z) - Private prediction for large-scale synthetic text generation [28.488459921169905]
We present an approach for generating differentially private synthetic text using large language models (LLMs)
In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees.
arXiv Detail & Related papers (2024-07-16T18:28:40Z) - The Privacy-Utility Trade-off in the Topics API [0.34952465649465553]
We analyze the re-identification risks for individual Internet users and the utility provided to advertising companies by the Topics API.
We provide theoretical results dependent only on the API parameters that can be readily applied to evaluate the privacy and utility implications of future API updates.
arXiv Detail & Related papers (2024-06-21T17:01:23Z) - A Public and Reproducible Assessment of the Topics API on Real Data [1.1510009152620668]
The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies.
Results of prior work have led to an ongoing discussion about the capability of Topics to trade off both utility and privacy.
This paper shows on real data that Topics does not provide the same privacy guarantees to all users and that the information leakage worsens over time.
arXiv Detail & Related papers (2024-03-28T17:03:44Z) - Summary Reports Optimization in the Privacy Sandbox Attribution Reporting API [51.00674811394867]
The Attribution Reporting API has been deployed by Google Chrome to support the basic advertising functionality of attribution reporting.
We present methods for optimizing the allocation of the contribution budget for summary reports from the API.
arXiv Detail & Related papers (2023-11-22T18:45:20Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.