Lambda-randomization: multi-dimensional randomized response made easy
- URL: http://arxiv.org/abs/2603.05261v1
- Date: Thu, 05 Mar 2026 15:11:22 GMT
- Title: Lambda-randomization: multi-dimensional randomized response made easy
- Authors: Nicolas Ruiz,
- Abstract summary: We develop a protocol called Lambda-randomization that entails low computational costs to retrieve estimates of multivariate distributions.<n>We also present an empirical application to illustrate the proposed protocol.
- Score: 1.2691047660244335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized response is a popular local anonymization approach that can deliver anonymized multi-dimensional data sets with rigorous privacy guarantees. At the same time, it can ensure validity for exploratory analysis and machine learning tasks as, under fairly general conditions, unbiased estimates of the underlying true distributions can be retrieved. However, and like for many other anonymization techniques, one of the main pitfalls of this approach is the curse of dimensionality. When coping with data sets with many attributes, one quickly runs into unsustainable computational costs for estimating true distributions, as well as a degradation in their accuracies. Relying on new theoretical insights developed in this paper, we propose an approach to multi-dimensional randomized response that avoids these traditional limitations. From simple yet intuitive parameterizations of the randomization matrices that we introduce, we develop a protocol called Lambda-randomization that entails low computational costs to retrieve estimates of multivariate distributions, and that makes use of solely three simple elements: a set of parameters ranging between 0 and 1 (one per attribute of the data set), the identity matrix, and the all-ones vector. We also present an empirical application to illustrate the proposed protocol.
Related papers
- High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference [0.26784722398800515]
We propose a differentially private quantile regression method for high-dimensional data in a distributed setting.<n>We develop a differentially private estimation algorithm with iterative updates, ensuring near-optimal statistical accuracy and formal privacy guarantees.
arXiv Detail & Related papers (2025-08-07T09:47:44Z) - Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z) - From Randomized Response to Randomized Index: Answering Subset Counting Queries with Local Differential Privacy [27.59934932590226]
Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy.<n>We propose an alternative approach -- instead of perturbing values, we apply randomization to indexes of values.<n>Inspired by the deniability of randomized indexes, we present CRIAD for answering subset counting queries on set-value data.
arXiv Detail & Related papers (2025-04-24T13:08:11Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.<n>We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Active Sampling of Multiple Sources for Sequential Estimation [92.37271004438406]
The objective is to design an active sampling algorithm for sequentially estimating parameters in order to form reliable estimates.
This paper adopts emph conditional estimation cost functions, leading to a sequential estimation approach that was recently shown to render tractable analysis.
arXiv Detail & Related papers (2022-08-10T15:58:05Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations.
Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two.
This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z) - Decentralised Learning with Random Features and Distributed Gradient
Descent [39.00450514924611]
We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in a homogenous setting.
We establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of Random Features.
We present simulations that show how the number of Random Features, iterations and samples impact predictive performance.
arXiv Detail & Related papers (2020-07-01T09:55:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.