Estimating the True Distribution of Data Collected with Randomized Response
- URL: http://arxiv.org/abs/2601.08603v1
- Date: Tue, 13 Jan 2026 14:47:45 GMT
- Title: Estimating the True Distribution of Data Collected with Randomized Response
- Authors: Carlos Antonio Pinzón, Ehab ElSalamouny, Lucas Massot, Alexis Miller, Héber Hwang Arcolezi, Catuscia Palamidessi,
- Abstract summary: Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees.<n>It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data.
- Score: 4.529341481916973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for the exact MLE of RR and compares it with other estimation methods experimentally to help practitioners decide which one to use.
Related papers
- RPWithPrior: Label Differential Privacy in Regression [0.0]
In this paper, we focus on regression tasks under $$-label differential privacy guarantees.<n>We model both original and randomized responses as continuous random variables, avoiding discretization entirely.<n>We prove that our algorithm, RPWithPrior, guarantees $$-label differential privacy.
arXiv Detail & Related papers (2026-01-30T06:27:13Z) - One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation [0.7106986689736825]
Federated ridge regression is a distributed equilibrium problem where each client computes local sufficient statistics and transmits them once.<n>We establish differential privacy guarantees where noise is injected once per client, eliminating the composition penalty that degrades privacy in multi-round protocols.<n>Experiments on synthetic heterogeneous regression demonstrate that one-shot fusion matches FedAvg accuracy while requiring up to $38times$ less communication.
arXiv Detail & Related papers (2026-01-13T04:47:22Z) - SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation [75.56845750400116]
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems.
We develop SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models.
Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE)
arXiv Detail & Related papers (2024-11-14T17:53:35Z) - Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts [8.700087812420687]
We provide guarantees for the model's performance on a different, unseen network "B"<n>We show how the principled vanilla DKW bound enables certification of the model's true performance on unseen clients within the same (source) network.
arXiv Detail & Related papers (2024-10-26T18:45:15Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - REAL: A Representative Error-Driven Approach for Active Learning [15.477921200056887]
$REAL$ is a novel approach to select data instances with $underlineR$epresentative $underlineE$rrors for $underlineA$ctive $underlineL$.
It identifies minority predictions as emphpseudo errors within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density.
arXiv Detail & Related papers (2023-07-03T12:39:26Z) - R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility
Across Random User Intents [14.455036827804541]
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output.
We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE)
R-U-SURE is an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility.
arXiv Detail & Related papers (2023-03-01T18:46:40Z) - Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target [56.99669411766284]
As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal.
This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary.
We compute the probability estimates by inverting conformal prediction.
arXiv Detail & Related papers (2022-11-29T18:41:20Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.