Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy
- URL: http://arxiv.org/abs/2601.22320v1
- Date: Thu, 29 Jan 2026 21:02:58 GMT
- Title: Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy
- Authors: Nikita P. Kalinin, Ali Najar, Valentin Roth, Christoph H. Lampert,
- Abstract summary: We study continual mean estimation, where data arrive sequentially and the goal is to maintain accurate estimates of the running mean.<n>We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points.
- Score: 17.01170335906325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study continual mean estimation, where data vectors arrive sequentially and the goal is to maintain accurate estimates of the running mean. We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points. Previous work on this problem has focused on pure differential privacy. While important, this approach limits applicability, as it leads to overly noisy estimates. In contrast, we analyze the problem under approximate differential privacy, adopting recent advances in the Matrix Factorization mechanism. We introduce a novel mean estimation specific factorization, which is both efficient and accurate, achieving asymptotically lower mean-squared error bounds in continual mean estimation under user-level differential privacy.
Related papers
- Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Private Estimation when Data and Privacy Demands are Correlated [5.755004576310333]
Differential Privacy is the current gold-standard for ensuring privacy for statistical queries.<n>We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data.<n>We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error.
arXiv Detail & Related papers (2024-07-15T22:46:02Z) - On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error [4.809236881780709]
We consider the private release of statistics of disjoint subsets of a dataset, where users could contribute more than one sample.<n>In particular, we focus on the $epsilon$-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset.<n>Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation.
arXiv Detail & Related papers (2024-05-10T06:24:35Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Non-parametric Differentially Private Confidence Intervals for the
Median [3.205141100055992]
This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median.
We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion.
arXiv Detail & Related papers (2021-06-18T19:45:37Z) - Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices.
An increasing number of such databases consist of data from multiple sources, not all of which can be trusted.
This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints.
We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Designing Differentially Private Estimators in High Dimensions [0.0]
We study differentially private mean estimation in a high-dimensional setting.
Recent work in high-dimensional robust statistics has identified computationally tractable mean estimation algorithms.
arXiv Detail & Related papers (2020-06-02T21:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.