Related papers: Improved subsample-and-aggregate via the private modified winsorized mean

Improved subsample-and-aggregate via the private modified winsorized mean

URL: http://arxiv.org/abs/2501.14095v1
Date: Thu, 23 Jan 2025 21:03:40 GMT
Title: Improved subsample-and-aggregate via the private modified winsorized mean
Authors: Kelly Ramsay, Dylan Spicker,
Abstract summary: We show that the modified winsorized mean is minimax optimal for several, large classes of distributions.<n>We consider the modified winsorized mean as the aggregator in subsample-and-aggregate.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We develop a univariate, differentially private mean estimator, called the private modified winsorized mean designed to be used as the aggregator in subsample-and-aggregate. We demonstrate, via real data analysis, that common differentially private multivariate mean estimators may not perform well as the aggregator, even with a dataset with 8000 observations, motivating our developments. We show that the modified winsorized mean is minimax optimal for several, large classes of distributions, even under adversarial contamination. We also demonstrate that, empirically, the modified winsorized mean performs well compared to other private mean estimates. We consider the modified winsorized mean as the aggregator in subsample-and-aggregate, deriving a finite sample deviations bound for a subsample-and-aggregate estimate generated with the new aggregator. This result yields two important insights: (i) the optimal choice of subsamples depends on the bias of the estimator computed on the subsamples, and (ii) the rate of convergence of the subsample-and-aggregate estimator depends on the robustness of the estimator computed on the subsamples.

Related papers

Improving Value-based Process Verifier via Structural Prior Injection [30.07647106495661]
We show that reasonable structural prior injection can improve the performance of value-based process verifiers for about 1$sim$2 points at little-to-no cost. We also show that under different structural prior, the verifiers' performances vary greatly despite having the same optimal solution.
arXiv Detail & Related papers (2025-02-21T07:57:59Z)
On Improving the Composition Privacy Loss in Differential Privacy for Fixed Estimation Error [4.809236881780709]
We consider the private release of statistics of disjoint subsets of a dataset, where users could contribute more than one sample. In particular, we focus on the $epsilon$-differentially private release of sample means and variances of sample values in disjoint subsets of a dataset. Our main contribution is an iterative algorithm, based on suppressing user contributions, which seeks to reduce the overall privacy loss degradation.
arXiv Detail & Related papers (2024-05-10T06:24:35Z)
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data. Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z)
Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z)
Classification of Heavy-tailed Features in High Dimensions: a Superstatistical Approach [1.4469725791865984]
We characterise the learning of a mixture of two clouds of data points with generic centroids. We study the generalisation performance of the obtained estimator, we analyse the role of regularisation, and we analytically the separability transition.
arXiv Detail & Related papers (2023-04-06T07:53:05Z)
A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms. We develop a general framework from the perspective of Bregman minimization divergence. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z)
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples. CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling. We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z)
Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems. We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
Nonparametric Estimation of the Fisher Information and Its Applications [82.00720226775964]
This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. An estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. A new estimator, termed a clipped estimator, is proposed.
arXiv Detail & Related papers (2020-05-07T17:21:56Z)
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
Estimating Gradients for Discrete Random Variables by Sampling without Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.