Private Query Release via the Johnson-Lindenstrauss Transform
- URL: http://arxiv.org/abs/2208.07410v1
- Date: Mon, 15 Aug 2022 19:19:16 GMT
- Title: Private Query Release via the Johnson-Lindenstrauss Transform
- Authors: Aleksandar Nikolov
- Abstract summary: We introduce a new method for releasing answers to statistical queries with differential privacy.
The key idea is to randomly project the query answers to a lower dimensional space.
We answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension.
- Score: 93.20051580730234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new method for releasing answers to statistical queries with
differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is
to randomly project the query answers to a lower dimensional space so that the
distance between any two vectors of feasible query answers is preserved up to
an additive error. Then we answer the projected queries using a simple
noise-adding mechanism, and lift the answers up to the original dimension.
Using this method, we give, for the first time, purely differentially private
mechanisms with optimal worst case sample complexity under average error for
answering a workload of $k$ queries over a universe of size $N$. As other
applications, we give the first purely private efficient mechanisms with
optimal sample complexity for computing the covariance of a bounded
high-dimensional distribution, and for answering 2-way marginal queries. We
also show that, up to the dependence on the error, a variant of our mechanism
is nearly optimal for every given query workload.
Related papers
- Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Answering Ambiguous Questions via Iterative Prompting [84.3426020642704]
In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist.
One approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity.
We present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.
arXiv Detail & Related papers (2023-07-08T04:32:17Z) - Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs
Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs)
Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures.
We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z) - Query2Particles: Knowledge Graph Reasoning with Particle Embeddings [49.64006979045662]
We propose a query embedding method to answer complex logical queries on knowledge graphs with missing edges.
The answer entities are selected according to the similarities between the entity embeddings and the query embedding.
A complex KG query answering method, Q2P, is proposed to retrieve diverse answers from different areas over the embedding space.
arXiv Detail & Related papers (2022-04-27T11:16:08Z) - Generalization in the Face of Adaptivity: A Bayesian Perspective [3.0202264016476623]
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting.
It turns out that simple noise unbounded addition algorithms suffice to prevent this issue.
We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries.
arXiv Detail & Related papers (2021-06-20T22:06:44Z) - A Mutual Information Maximization Approach for the Spurious Solution
Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals.
There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance.
We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z) - Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals.
Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation.
We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z) - Compressive Privatization: Sparse Distribution Estimation under Locally
Differentially Privacy [18.43218511751587]
We show that as long as the target distribution is sparse or approximately sparse, the number of samples needed could be significantly reduced.
Our mechanism does privatization and dimensionality reduction simultaneously, and the sample complexity will only depend on the reduced dimensionality.
arXiv Detail & Related papers (2020-12-03T17:14:23Z) - Answering Ambiguous Questions through Generative Evidence Fusion and
Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions.
Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.