Efficient Approximate Recovery from Pooled Data Using Doubly Regular
Pooling Schemes
- URL: http://arxiv.org/abs/2303.00043v1
- Date: Tue, 28 Feb 2023 19:31:40 GMT
- Title: Efficient Approximate Recovery from Pooled Data Using Doubly Regular
Pooling Schemes
- Authors: Max Hahn-Klimroth, Dominik Kaaser, Malin Rau
- Abstract summary: We analyze an approximate reconstruction algorithm that estimates the hidden bits in a greedy fashion.
Our analysis is uniform in the degree of noise and the sparsity of $sigma$.
- Score: 1.7403133838762448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the pooled data problem we are given $n$ agents with hidden state bits,
either $0$ or $1$. The hidden states are unknown and can be seen as the
underlying ground truth $\sigma$. To uncover that ground truth, we are given a
querying method that queries multiple agents at a time. Each query reports the
sum of the states of the queried agents. Our goal is to learn the hidden state
bits using as few queries as possible.
So far, most literature deals with exact reconstruction of all hidden state
bits. We study a more relaxed variant in which we allow a small fraction of
agents to be classified incorrectly. This becomes particularly relevant in the
noisy variant of the pooled data problem where the queries' results are subject
to random noise. In this setting, we provide a doubly regular test design that
assigns agents to queries. For this design we analyze an approximate
reconstruction algorithm that estimates the hidden bits in a greedy fashion. We
give a rigorous analysis of the algorithm's performance, its error probability,
and its approximation quality. As a main technical novelty, our analysis is
uniform in the degree of noise and the sparsity of $\sigma$. Finally,
simulations back up our theoretical findings and provide strong empirical
evidence that our algorithm works well for realistic sample sizes.
Related papers
- Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Recovering Top-Two Answers and Confusion Probability in Multi-Choice
Crowdsourcing [10.508187462682308]
We consider crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability.
We propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices.
Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability.
arXiv Detail & Related papers (2022-12-29T09:46:39Z) - Distributed Reconstruction of Noisy Pooled Data [2.0559497209595823]
We consider two noise models for the pooled data problem.
We present and analyze for both error models a simple and efficient distributed algorithm.
Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability.
arXiv Detail & Related papers (2022-04-14T06:48:40Z) - ReLU Regression with Massart Noise [52.10842036932169]
We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data.
We focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model.
We develop an efficient algorithm that achieves exact parameter recovery in this model.
arXiv Detail & Related papers (2021-09-10T02:13:22Z) - Generalization in the Face of Adaptivity: A Bayesian Perspective [3.0202264016476623]
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting.
It turns out that simple noise unbounded addition algorithms suffice to prevent this issue.
We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries.
arXiv Detail & Related papers (2021-06-20T22:06:44Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals.
Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation.
We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z) - The Sparse Vector Technique, Revisited [67.57692396665915]
We revisit one of the most basic and widely applicable techniques in the literature of differential privacy.
This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be.
We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries.
arXiv Detail & Related papers (2020-10-02T10:50:52Z) - GeoDA: a geometric framework for black-box adversarial attacks [79.52980486689287]
We propose a framework to generate adversarial examples in one of the most challenging black-box settings.
Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples.
arXiv Detail & Related papers (2020-03-13T20:03:01Z) - Contextual Search in the Presence of Adversarial Corruptions [33.28268414842846]
We study contextual search, a generalization of binary search in higher dimensions.
We show that these algorithms attain near-optimal regret in the absence of adversarial corruptions.
Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.
arXiv Detail & Related papers (2020-02-26T17:25:53Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.