Scalable Private Search with Wally
- URL: http://arxiv.org/abs/2406.06761v5
- Date: Thu, 03 Oct 2024 19:40:54 GMT
- Title: Scalable Private Search with Wally
- Authors: Hilal Asi, Fabian Boemer, Nicholas Genise, Muhammad Haris Mughees, Tabitha Ogilvie, Rehan Rishi, Guy N. Rothblum, Kunal Talwar, Karl Tarbe, Ruiyu Zhu, Marco Zuliani,
- Abstract summary: Wally is a private search system that supports efficient semantic and keyword search queries against large databases.
In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry.
We achieve these results by requiring each client to add a few fake queries and send each query via an anonymous network to the server.
- Score: 27.387418704955994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Wally, a private search system that supports efficient semantic and keyword search queries against large databases. When sufficiently many clients are making queries, Wally's performance is significantly better than previous systems. In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry. As a result, performance degraded proportionally with the number of entries in the database. In Wally, we get rid of this limitation. Specifically, for each query the server performs cryptographic operations only against a few database entries. We achieve these results by requiring each client to add a few fake queries and send each query via an anonymous network to the server at independently chosen random instants. Additionally, each client also uses somewhat homomorphic encryption (SHE) to hide whether a query is real or fake. Wally provides $(\epsilon, \delta)$-differential privacy guarantee, which is an accepted standard for strong privacy. The number of fake queries each client makes depends inversely on the number of clients making queries. Therefore, the fake queries' overhead vanishes as the number of clients increases, enabling scalability to millions of queries and large databases. Concretely, Wally can process eight million queries in just 117 mins. That is around four orders of magnitude less than the state of the art.
Related papers
- Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases [40.94869284214594]
Large language models (LLMs) are increasingly used as personal agents, accessing sensitive user data such as calendars, emails, and medical records.<n>Users currently face a trade-off: They can send private records to powerful but untrusted LLM providers, increasing their exposure risk.
arXiv Detail & Related papers (2025-06-19T07:13:30Z) - Are LLM-based methods good enough for detecting unfair terms of service? [67.49487557224415]
Large language models (LLMs) are good at parsing long text-based documents.
We build a dataset consisting of 12 questions applied individually to a set of privacy policies.
Some open-source models are able to provide a higher accuracy compared to some commercial models.
arXiv Detail & Related papers (2024-08-24T09:26:59Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference [3.9340847245305732]
We propose a private embedding table query optimization framework, dubbed FastQuery.
FastQuery features a communication-aware embedding table quantization algorithm and a one-hot-aware dense packing algorithm.
Compared to prior-art HE-based frameworks, FastQuery achieves more than $4.3times$, $2.7times$, $1.3times$ latency reduction, respectively.
arXiv Detail & Related papers (2024-05-25T13:58:45Z) - The Surprising Effectiveness of Rankers Trained on Expanded Queries [4.874071145951159]
We improve the ranking performance of hard or difficult queries without compromising the performance of other queries.
We combine relevance scores from the specialized ranker and the base ranker, along with a query performance score estimated for each query.
In our experiments on the DL-Hard dataset, we find that a principled query performance based scoring method offers a significant improvement of up to 25% on the passage ranking task.
arXiv Detail & Related papers (2024-04-03T09:12:22Z) - Private Aggregate Queries to Untrusted Databases [3.6209090009720155]
Private information retrieval (PIR) is a privacy-preserving cryptographic tool.
Most PIR protocols require the client to know the exact row index of the intended database item.
We build a novel information-theoretic PIR framework that permits a user to fetch the aggregated result.
arXiv Detail & Related papers (2024-03-20T04:35:21Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Improving Content Retrievability in Search with Controllable Query
Generation [5.450798147045502]
Machine-learned search engines have a high retrievability bias, where the majority of the queries return the same entities.
We propose CtrlQGen, a method that generates queries for a chosen underlying intent-narrow or broad.
Our results on datasets from the domains of music, podcasts, and books reveal that we can significantly decrease the retrievability bias of a dense retrieval model.
arXiv Detail & Related papers (2023-03-21T07:46:57Z) - Searching for Better Database Queries in the Outputs of Semantic Parsers [16.221439565760058]
In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries.
The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests.
We apply our approach to the state-of-the-art semantics and report that it allows us to find many queries passing all the tests on different datasets.
arXiv Detail & Related papers (2022-10-13T17:20:45Z) - Private Query Release via the Johnson-Lindenstrauss Transform [93.20051580730234]
We introduce a new method for releasing answers to statistical queries with differential privacy.
The key idea is to randomly project the query answers to a lower dimensional space.
We answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension.
arXiv Detail & Related papers (2022-08-15T19:19:16Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Database Reasoning Over Text [11.074939080454412]
We show that state-of-the-art transformer models perform very well for small databases.
We propose a modular architecture to answer database-style queries over multiple spans from text.
Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded.
arXiv Detail & Related papers (2021-06-02T11:09:40Z) - The Sparse Vector Technique, Revisited [67.57692396665915]
We revisit one of the most basic and widely applicable techniques in the literature of differential privacy.
This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be.
We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries.
arXiv Detail & Related papers (2020-10-02T10:50:52Z) - Improving Query Safety at Pinterest [46.57632646205479]
PinSets is a system for query-set expansion.
It applies a simple yet powerful mechanism to search user sessions.
It expands a tiny seed set into thousands of related queries at nearly perfect precision.
arXiv Detail & Related papers (2020-06-20T07:35:22Z) - Open Domain Question Answering Using Web Tables [8.25461115955717]
We develop an open-domain QA approach using web tables that works for both factoid and non-factoid queries.
Our solution is used in production in a major commercial web search engine and serves direct answers for tens of millions of real user queries per month.
arXiv Detail & Related papers (2020-01-10T01:25:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.