ORQ: Complex Analytics on Private Data with Strong Security Guarantees
- URL: http://arxiv.org/abs/2509.10793v1
- Date: Sat, 13 Sep 2025 03:19:01 GMT
- Title: ORQ: Complex Analytics on Private Data with Strong Security Guarantees
- Authors: Eli Baum, Sam Buxbaum, Nitin Mathai, Muhammad Faisal, Vasiliki Kalavri, Mayank Varia, John Liagouris,
- Abstract summary: We present ORQ, a system that enables collaborative analysis of large private datasets using cryptographically secure multi-party computation (MPC)<n> ORQ protects data against semi-honest or malicious parties and can efficiently evaluate relational queries with multi-way joins and aggregations.<n>We evaluate ORQ in LAN and WAN deployments on a diverse set of workloads, including complex queries with multiple joins and custom aggregations.
- Score: 6.911971205862421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present ORQ, a system that enables collaborative analysis of large private datasets using cryptographically secure multi-party computation (MPC). ORQ protects data against semi-honest or malicious parties and can efficiently evaluate relational queries with multi-way joins and aggregations that have been considered notoriously expensive under MPC. To do so, ORQ eliminates the quadratic cost of secure joins by leveraging the fact that, in practice, the structure of many real queries allows us to join records and apply the aggregations "on the fly" while keeping the result size bounded. On the system side, ORQ contributes generic oblivious operators, a data-parallel vectorized query engine, a communication layer that amortizes MPC network costs, and a dataflow API for expressing relational analytics -- all built from the ground up. We evaluate ORQ in LAN and WAN deployments on a diverse set of workloads, including complex queries with multiple joins and custom aggregations. When compared to state-of-the-art solutions, ORQ significantly reduces MPC execution times and can process one order of magnitude larger datasets. For our most challenging workload, the full TPC-H benchmark, we report results entirely under MPC with Scale Factor 10 -- a scale that had previously been achieved only with information leakage or the use of trusted third parties.
Related papers
- OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning [51.58235452818926]
We introduce OrLog, a neuro-symbolic retrieval framework that decouples predicate-level plausibility estimation from logical reasoning.<n>A large language model (LLM) provides plausibility scores for atomic predicates in one decoding-free forward pass, from which a probabilistic reasoning engine derives the posterior probability of query satisfaction.
arXiv Detail & Related papers (2026-01-30T15:31:58Z) - Accurate Table Question Answering with Accessible LLMs [78.91480799683346]
Given a table T in a database and a question Q in natural language, the table question answering (TQA) task aims to return an accurate answer to Q based on the content of T.<n>Recent state-of-the-art solutions leverage large language models (LLMs) to obtain high-quality answers.<n>This paper focuses on TQA with smaller, open-weight LLMs that can run on a desktop or laptop.
arXiv Detail & Related papers (2026-01-06T16:07:25Z) - MSCoRe: A Benchmark for Multi-Stage Collaborative Reasoning in LLM Agents [7.339769470891067]
MSCoRe is a novel benchmark comprising 126696 domain-specific QA instances spanning scenarios in automotive, pharmaceutical, electronics, and energy sectors.<n>The commercial models performed best across all tasks and scenarios, but a notable gap in ROUGE scores remains between simple and complex tasks.<n>MSCoRe provides a valuable new resource for the community to evaluate and improve multi-stage reasoning in LLM agents.
arXiv Detail & Related papers (2025-09-22T11:36:16Z) - MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z) - The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z) - BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning [7.738546538164454]
A key issue in minimizing the overall makespan of data pipelines is the efficient scheduling of concurrent queries.<n>To our knowledge, BQSched is the first non-intrusive batch query scheduler via reinforcement learning.<n>Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling.
arXiv Detail & Related papers (2025-04-27T07:49:01Z) - Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering [16.790216473975146]
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning.<n>Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs.<n>We propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning.
arXiv Detail & Related papers (2024-12-28T13:13:33Z) - TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension [8.489816179329832]
We present TQA-Bench, a new multi-table QA benchmark designed to evaluate the capabilities of large language models (LLMs) in tackling complex QA tasks over relational data.<n>Our benchmark incorporates diverse relational database instances sourced from real-world public datasets.<n>We systematically evaluate a range of LLMs, both open-source and closed-source, spanning model scales from 7 billion to 70 billion parameters.
arXiv Detail & Related papers (2024-11-29T06:48:13Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - NQE: N-ary Query Embedding for Complex Query Answering over
Hyper-Relational Knowledge Graphs [1.415350927301928]
Complex query answering is an essential task for logical reasoning on knowledge graphs.
We propose a novel N-ary Query Embedding (NQE) model for CQA over hyper-relational knowledge graphs (HKGs)
NQE utilizes a dual-heterogeneous Transformer encoder and fuzzy logic theory to satisfy all n-ary FOL queries.
We generate a new CQA dataset WD50K-NFOL, including diverse n-ary FOL queries over WD50K.
arXiv Detail & Related papers (2022-11-24T08:26:18Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments.
We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query.
We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.