Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph
- URL: http://arxiv.org/abs/2509.16180v1
- Date: Fri, 19 Sep 2025 17:41:15 GMT
- Title: Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph
- Authors: Gautam Kamath, Alireza F. Pour, Matthew Regehr, David P. Woodruff,
- Abstract summary: We describe an algorithm that satisfies local differential privacy, performs $tildeO(k3/2)$ non-adaptive queries to individuals.<n>We also introduce a new object we dub the Scheff'e graph, which captures structure of the differences between distributions in $Q$.
- Score: 45.975805184376036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an algorithm with improved query-complexity for the problem of hypothesis selection under local differential privacy constraints. Given a set of $k$ probability distributions $Q$, we describe an algorithm that satisfies local differential privacy, performs $\tilde{O}(k^{3/2})$ non-adaptive queries to individuals who each have samples from a probability distribution $p$, and outputs a probability distribution from the set $Q$ which is nearly the closest to $p$. Previous algorithms required either $\Omega(k^2)$ queries or many rounds of interactive queries. Technically, we introduce a new object we dub the Scheff\'e graph, which captures structure of the differences between distributions in $Q$, and may be of more broad interest for hypothesis selection tasks.
Related papers
- Learning Multinomial Logits in $O(n \log n)$ time [56.23331174813387]
A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=1,..., n$, each assigned a positive weight.<n>A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight.<n>This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature.
arXiv Detail & Related papers (2026-01-07T22:07:44Z) - Bayesian Advantage of Re-Identification Attack in the Shuffle Model [7.289625261239043]
The shuffle model, which anonymizes data by randomly permuting messages, has been widely adopted in cryptography and differential privacy.<n>We present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model.
arXiv Detail & Related papers (2025-11-05T06:03:56Z) - Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor [7.069470347531414]
Estimating the density of a distribution from its samples is a fundamental problem in statistics.<n> Hypothesis selection addresses the setting where, in addition to a sample set, we are given $n$ candidate distributions.<n>We propose a differentially private algorithm in the central model that runs in nearly-linear time with respect to the number of hypotheses.
arXiv Detail & Related papers (2025-06-01T20:46:46Z) - On the query complexity of sampling from non-log-concave distributions [2.4253233571593547]
We study the problem of sampling from a $d$-dimensional distribution with density $p(x)propto e-f(x)$, which does not necessarily satisfy good isoperimetric conditions.<n>We show that for a wide range of parameters, sampling is strictly easier than optimization by a super-exponential factor in the dimension $d$.
arXiv Detail & Related papers (2025-02-10T06:54:16Z) - Sample-Optimal Locally Private Hypothesis Selection and the Provable
Benefits of Interactivity [8.100854060749212]
We study the problem of hypothesis selection under the constraint of local differential privacy.
We devise an $varepsilon$-locally-differentially-private ($varepsilon$-LDP) algorithm that uses $Thetaleft(fracklog kalpha2min varepsilon2,1 right)$ to guarantee that $d_TV(h,hatf)leq alpha + 9 min_fin mathcalF
arXiv Detail & Related papers (2023-12-09T19:22:10Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions.
To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits.
In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Simple Binary Hypothesis Testing under Local Differential Privacy and
Communication Constraints [8.261182037130407]
We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints.
We qualify our results as either minimax optimal or instance optimal.
arXiv Detail & Related papers (2023-01-09T18:36:49Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.