CriteoPrivateAd: A Real-World Bidding Dataset to Design Private Advertising Systems
- URL: http://arxiv.org/abs/2502.12103v2
- Date: Wed, 19 Feb 2025 12:35:59 GMT
- Title: CriteoPrivateAd: A Real-World Bidding Dataset to Design Private Advertising Systems
- Authors: Mehdi Sebbar, Corentin Odic, Mathieu Léchine, Aloïs Bissuel, Nicolas Chrysanthos, Anthony D'Amato, Alexandre Gilotte, Fabian Höring, Sarah Nogueira, Maxime Vono,
- Abstract summary: This dataset stands for an anonymised version of Criteo production logs.
It provides sufficient data to learn bidding models commonly used in online advertising under many privacy constraints.
- Score: 36.80382261547636
- License:
- Abstract: In the past years, many proposals have emerged in order to address online advertising use-cases without access to third-party cookies. All these proposals leverage some privacy-enhancing technologies such as aggregation or differential privacy. Yet, no public and rich-enough ground truth is currently available to assess the relevancy of aforementioned private advertising frameworks. We are releasing the largest, in terms of number of features, bidding dataset specifically built in alignment with the design of major browser vendors proposals such as Chrome Privacy Sandbox. This dataset, coined CriteoPrivateAd, stands for an anonymised version of Criteo production logs and provides sufficient data to learn bidding models commonly used in online advertising under many privacy constraints (delayed reports, display and user-level differential privacy, user signal quantisation or aggregated reports). We ensured that this dataset, while being anonymised, is able to provide offline results close to production performance of adtech companies including Criteo - making it a relevant ground truth to design private advertising systems. The dataset is available in Hugging Face: https://huggingface.co/datasets/criteo/CriteoPrivateAd.
Related papers
- Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy [64.32494202656801]
Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence.
We present anonymization pipeline that replaces sensitive human subjects in video datasets with synthetic avatars within context.
We also proposeMaskDP to protect non-anonymized but privacy sensitive background information.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration [18.11846784025521]
PrivacyRestore is a plug-and-play method to protect the privacy of user inputs during inference.
We create three datasets, covering medical and legal domains, to evaluate the effectiveness of PrivacyRestore.
arXiv Detail & Related papers (2024-06-03T14:57:39Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Towards a User Privacy-Aware Mobile Gaming App Installation Prediction
Model [0.8602553195689513]
We investigate the process of predicting a mobile gaming app installation from the point of view of a demand-side platform.
We explore the trade-off between privacy preservation and model performance.
We conclude that privacy-aware models might still preserve significant capabilities.
arXiv Detail & Related papers (2023-02-07T09:14:59Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Lessons from the AdKDD'21 Privacy-Preserving ML Challenge [57.365745458033075]
A prominent proposal at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays.
To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21.
A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap.
arXiv Detail & Related papers (2022-01-31T11:09:59Z) - Utility-aware Privacy-preserving Data Releasing [7.462336024223669]
We propose a two-step perturbation-based privacy-preserving data releasing framework.
First, certain predefined privacy and utility problems are learned from the public domain data.
We then leverage the learned knowledge to precisely perturb the data owners' data into privatized data.
arXiv Detail & Related papers (2020-05-09T05:32:46Z) - PrivEdge: From Local to Distributed Private Training and Prediction [43.02041269239928]
PrivEdge is a technique for privacy-preserving Machine Learning (ML)
PrivEdge safeguards the privacy of users who provide their data for training, as well as users who use the prediction service.
We show that PrivEdge has high precision and recall in preserving privacy, as well as in distinguishing between private and non-private images.
arXiv Detail & Related papers (2020-04-12T09:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.