Lessons from the AdKDD'21 Privacy-Preserving ML Challenge
- URL: http://arxiv.org/abs/2201.13123v1
- Date: Mon, 31 Jan 2022 11:09:59 GMT
- Title: Lessons from the AdKDD'21 Privacy-Preserving ML Challenge
- Authors: Eustache Diemert, Romain Fabre, Alexandre Gilotte, Fei Jia, Basile
Leparmentier, J\'er\'emie Mary, Zhonghua Qu, Ugo Tanielian, Hui Yang
- Abstract summary: A prominent proposal at W3C only allows sharing advertising signals through aggregated, differentially private reports of past displays.
To study this proposal extensively, an open Privacy-Preserving Machine Learning Challenge took place at AdKDD'21.
A key finding is that learning models on large, aggregated data in the presence of a small set of unaggregated data points can be surprisingly efficient and cheap.
- Score: 57.365745458033075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing data sharing mechanisms providing performance and strong privacy
guarantees is a hot topic for the Online Advertising industry. Namely, a
prominent proposal discussed under the Improving Web Advertising Business Group
at W3C only allows sharing advertising signals through aggregated,
differentially private reports of past displays. To study this proposal
extensively, an open Privacy-Preserving Machine Learning Challenge took place
at AdKDD'21, a premier workshop on Advertising Science with data provided by
advertising company Criteo. In this paper, we describe the challenge tasks, the
structure of the available datasets, report the challenge results, and enable
its full reproducibility. A key finding is that learning models on large,
aggregated data in the presence of a small set of unaggregated data points can
be surprisingly efficient and cheap. We also run additional experiments to
observe the sensitivity of winning methods to different parameters such as
privacy budget or quantity of available privileged side information. We
conclude that the industry needs either alternate designs for private data
sharing or a breakthrough in learning with aggregated data only to keep ad
relevance at a reasonable level.
Related papers
- Digital Advertising in a Post-Cookie World: Charting the Impact of Google's Topics API [0.38233569758620056]
Integrating Google's Topics API into the digital advertising ecosystem represents a significant shift toward privacy-conscious advertising practices.
This article analyses the implications of implementing Topics API on ad networks, focusing on competition dynamics and ad space accessibility.
arXiv Detail & Related papers (2024-09-21T09:04:16Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - DCFL: Non-IID awareness Data Condensation aided Federated Learning [0.8158530638728501]
Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets.
The challenge lies in the fact that the client side private data may not be identically and independently distributed.
We propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients.
arXiv Detail & Related papers (2023-12-21T13:04:24Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - PRIVEE: A Visual Analytic Workflow for Proactive Privacy Risk Inspection
of Open Data [3.2136309934080867]
Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized.
We develop a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods.
We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism.
arXiv Detail & Related papers (2022-08-12T19:57:09Z) - Sotto Voce: Federated Speech Recognition with Differential Privacy
Guarantees [0.761963751158349]
Speech data is expensive to collect, and incredibly sensitive to its sources.
It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning.
Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset.
arXiv Detail & Related papers (2022-07-16T02:48:54Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z) - Utility-aware Privacy-preserving Data Releasing [7.462336024223669]
We propose a two-step perturbation-based privacy-preserving data releasing framework.
First, certain predefined privacy and utility problems are learned from the public domain data.
We then leverage the learned knowledge to precisely perturb the data owners' data into privatized data.
arXiv Detail & Related papers (2020-05-09T05:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.