Related papers: Private Query Release via the Johnson-Lindenstrauss Transform

Private Query Release via the Johnson-Lindenstrauss Transform

URL: http://arxiv.org/abs/2208.07410v1
Date: Mon, 15 Aug 2022 19:19:16 GMT
Title: Private Query Release via the Johnson-Lindenstrauss Transform
Authors: Aleksandar Nikolov
Abstract summary: We introduce a new method for releasing answers to statistical queries with differential privacy. The key idea is to randomly project the query answers to a lower dimensional space. We answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension.
Score: 93.20051580730234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.

Related papers

Optimization Can Learn Johnson Lindenstrauss Embeddings [30.652854230884145]
Randomized methods like Johnson-Lindenstrauss (JL) provide unimprovable theoretical guarantees for achieving such representations. We present a novel method motivated by diffusion models, that circumvents this fundamental challenge. We show that by moving through this larger space, the objective converges to a deterministic (zero variance) solution, avoiding bad stationary points.
arXiv Detail & Related papers (2024-12-10T07:07:04Z)
Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity. Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Answering Ambiguous Questions via Iterative Prompting [84.3426020642704]
In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. One approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. We present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions.
arXiv Detail & Related papers (2023-07-08T04:32:17Z)
Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering [13.735277588793997]
We investigate how to enhance answer precision in frequently asked questions posed by distributed users using cloud-based Large Language Models (LLMs) Our study focuses on a typical situations where users ask similar queries that involve identical mathematical reasoning steps and problem-solving procedures. We propose to improve the distributed synonymous questions using Self-Consistency (SC) and Chain-of-Thought (CoT) techniques.
arXiv Detail & Related papers (2023-04-27T01:48:03Z)
Query2Particles: Knowledge Graph Reasoning with Particle Embeddings [49.64006979045662]
We propose a query embedding method to answer complex logical queries on knowledge graphs with missing edges. The answer entities are selected according to the similarities between the entity embeddings and the query embedding. A complex KG query answering method, Q2P, is proposed to retrieve diverse answers from different areas over the embedding space.
arXiv Detail & Related papers (2022-04-27T11:16:08Z)
Generalization in the Face of Adaptivity: A Bayesian Perspective [3.0202264016476623]
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting. It turns out that simple noise unbounded addition algorithms suffice to prevent this issue. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries.
arXiv Detail & Related papers (2021-06-20T22:06:44Z)
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals. There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance. We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z)
Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals. Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation. We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z)
Compressive Privatization: Sparse Distribution Estimation under Locally Differentially Privacy [18.43218511751587]
We show that as long as the target distribution is sparse or approximately sparse, the number of samples needed could be significantly reduced. Our mechanism does privatization and dimensionality reduction simultaneously, and the sample complexity will only depend on the reduced dimensionality.
arXiv Detail & Related papers (2020-12-03T17:14:23Z)
Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions. Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.