A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy
- URL: http://arxiv.org/abs/2405.13453v2
- Date: Thu, 24 Oct 2024 05:26:18 GMT
- Title: A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy
- Authors: Puning Zhao, Lifeng Lai, Li Shen, Qingming Li, Jiafei Wu, Zhe Liu,
- Abstract summary: Privacy protection of users' entire contribution of samples is important in distributed systems.
We propose a Huber loss minimization approach to mean estimation under user-level differential privacy.
We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error.
- Score: 32.38935276997549
- License:
- Abstract: Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clipping samples into the interval. However, the clipping operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local sample sizes can make the sensitivity much larger, thus the method is not suitable for imbalanced users. Motivated by these challenges, we propose a Huber loss minimization approach to mean estimation under user-level differential privacy. The connecting points of Huber loss can be adaptively adjusted to deal with imbalanced users. Moreover, it avoids the clipping operation, thus significantly reducing the bias compared with the two-stage approach. We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error. The result shows that the new method is much less sensitive to the imbalance of user-wise sample sizes and the tail of sample distributions. Finally, we perform numerical experiments to validate our theoretical analysis.
Related papers
- Personalized Denoising Implicit Feedback for Robust Recommender System [60.719158008403376]
We show that for a given user, there is a clear distinction between normal and noisy interactions in the user's personal loss distribution.
We propose a resampling strategy to Denoise using the user's Personal Loss distribution, named PLD, which reduces the probability of noisy interactions being optimized.
arXiv Detail & Related papers (2025-02-01T07:13:06Z) - Double Correction Framework for Denoising Recommendation [45.98207284259792]
In implicit feedback, noisy samples can affect precise user preference learning.
A popular solution is based on dropping noisy samples in the model training phase.
We propose a Double Correction Framework for Denoising Recommendation.
arXiv Detail & Related papers (2024-05-18T12:15:10Z) - Confronting Discrimination in Classification: Smote Based on
Marginalized Minorities in the Kernel Space for Imbalanced Data [0.0]
We propose a novel classification oversampling approach based on the decision boundary and sample proximity relationships.
We test the proposed method on a classic financial fraud dataset.
arXiv Detail & Related papers (2024-02-13T04:03:09Z) - A Huber loss-based super learner with applications to healthcare
expenditures [0.0]
We propose a super learner based on the Huber loss, a "robust" loss function that combines squared error loss with absolute loss to downweight.
We show that the proposed method can be used both directly to optimize Huber risk, as well as in finite-sample settings.
arXiv Detail & Related papers (2022-05-13T19:57:50Z) - Holistic Approach to Measure Sample-level Adversarial Vulnerability and
its Utility in Building Trustworthy Systems [17.707594255626216]
Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction.
We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives.
We demonstrate that by reliably estimating adversarial vulnerability at the sample level, it is possible to develop a trustworthy system.
arXiv Detail & Related papers (2022-05-05T12:36:17Z) - On the Pitfalls of Heteroscedastic Uncertainty Estimation with
Probabilistic Neural Networks [23.502721524477444]
We present a synthetic example illustrating how this approach can lead to very poor but stable estimates.
We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue.
We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
arXiv Detail & Related papers (2022-03-17T08:46:17Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.