Related papers: Scalable contribution bounding to achieve privacy

Scalable contribution bounding to achieve privacy

URL: http://arxiv.org/abs/2507.23432v1
Date: Thu, 31 Jul 2025 11:14:17 GMT
Title: Scalable contribution bounding to achieve privacy
Authors: Vincent Cohen-Addad, Alessandro Epasto, Jason Lee, Morteza Zadimoghaddam,
Abstract summary: In modern datasets, enforcing user-level privacy requires capping each user's total contribution.<n>Existing algorithms for this task are computationally intensive and do not scale to the massive datasets prevalent today.<n>Our approach models the complex ownership structure as a hypergraph, where users are vertices and records are hyperedges.<n>A record is added to the final dataset only if all its owners unanimously agree, thereby ensuring that no user's predefined contribution limit is violated.
Score: 62.6490768306231
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In modern datasets, where single records can have multiple owners, enforcing user-level differential privacy requires capping each user's total contribution. This "contribution bounding" becomes a significant combinatorial challenge. Existing sequential algorithms for this task are computationally intensive and do not scale to the massive datasets prevalent today. To address this scalability bottleneck, we propose a novel and efficient distributed algorithm. Our approach models the complex ownership structure as a hypergraph, where users are vertices and records are hyperedges. The algorithm proceeds in rounds, allowing users to propose records in parallel. A record is added to the final dataset only if all its owners unanimously agree, thereby ensuring that no user's predefined contribution limit is violated. This method aims to maximize the size of the resulting dataset for high utility while providing a practical, scalable solution for implementing user-level privacy in large, real-world systems.

Related papers

SKALD: Scalable K-Anonymisation for Large Datasets [4.1034194672472575]
SKALD is a novel algorithm for performing k-anonymisation on large datasets with limited RAM.<n>Our algorithm offers multi-fold performance improvement over standard k-anonymisation methods.
arXiv Detail & Related papers (2025-05-06T13:38:53Z)
Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.<n>The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.<n>We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z)
User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z)
Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points. It cannot be assumed that all users sample from the same underlying distribution. We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Differentially Private Heavy Hitter Detection using Federated Analytics [33.69819799254375]
We study practicals to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy.
arXiv Detail & Related papers (2023-07-21T17:59:15Z)
Private and Communication-Efficient Algorithms for Entropy Estimation [21.195965074919602]
We give improved private and communication-efficient algorithms for estimating several popular measures of the entropy of a distribution. All of our algorithms have constant communication cost and satisfy local differential privacy.
arXiv Detail & Related papers (2023-05-12T20:35:10Z)
Algorithms for bounding contribution for histogram estimation under user-level privacy [37.406400590568644]
We study the problem of histogram estimation under user-level differential privacy. The goal is to preserve the privacy of all entries of any single user. We propose algorithms to choose the best user contribution bound for histogram estimation.
arXiv Detail & Related papers (2022-06-07T04:53:24Z)
Meta Clustering Learning for Large-scale Unsupervised Person Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL) MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training. Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.