DP-starJ: A Differential Private Scheme towards Analytical Star-Join Queries
- URL: http://arxiv.org/abs/2310.04711v3
- Date: Fri, 17 Nov 2023 05:20:37 GMT
- Title: DP-starJ: A Differential Private Scheme towards Analytical Star-Join Queries
- Authors: Congcong Fu, Hui Li, Jian Lou, Jiangtao Cui,
- Abstract summary: Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios.
We propose DP-starJ, a novel Differentially Private framework for star-Join queries.
- Score: 10.022473569166532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Star-join query is the fundamental task in data warehouse and has wide applications in On-line Analytical Processing (OLAP) scenarios. Due to the large number of foreign key constraints and the asymmetric effect in the neighboring instance between the fact and dimension tables, even those latest DP efforts specifically designed for join, if directly applied to star-join query, will suffer from extremely large estimation errors and expensive computational cost. In this paper, we are thus motivated to propose DP-starJ, a novel Differentially Private framework for star-Join queries. DP-starJ consists of a series of strategies tailored to specific features of star-join, including 1) we unveil the different effect of fact and dimension tables on the neighboring database instances, and accordingly revisit the definitions tailored to different cases of star-join; 2) we propose Predicate Mechanism (PM), which utilizes predicate perturbation to inject noise into the join procedure instead of the results; 3) to further boost the robust performance, we propose a DP-compliant star-join algorithm for various types of star-join tasks based on PM. We provide both theoretical analysis and empirical study, which demonstrate the superiority of the proposed methods over the state-of-the-art solutions in terms of accuracy, efficiency, and scalability.
Related papers
- Differentially Private Retrieval-Augmented Generation [13.622078883013442]
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs)<n>RAG poses serious privacy risks when the database contains sensitive corpora, such as medical records or legal documents.<n>We present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm.
arXiv Detail & Related papers (2026-02-16T00:52:57Z) - DynaAct: Large Language Model Reasoning with Dynamic Action Spaces [58.298135359318024]
We propose a novel framework named textscDynaAct for automatically constructing a compact action space.<n>Our approach significantly improves overall performance, while maintaining efficient inference without introducing substantial latency.
arXiv Detail & Related papers (2025-11-11T09:47:13Z) - KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning [30.471243464952625]
In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models to new and data-scarce tasks.<n>We study the problem of example selection in ICL from a principled, information theory-driven perspective.<n>We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee.
arXiv Detail & Related papers (2025-09-19T06:50:03Z) - Access Paths for Efficient Ordering with Large Language Models [7.826046892571884]
We present the LLM ORDER BY operator as a logical abstraction and study its physical implementations within a unified evaluation framework.<n>We introduce three new designs: an agreement-based batch-size policy, a majority voting mechanism for pairwise sorting, and a two-way external merge sort adapted for LLMs.
arXiv Detail & Related papers (2025-08-30T01:44:36Z) - Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation [68.58373854950294]
We focus on causal reasoning and address the task of establishing causal relationships based on correlation information.
We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions.
We evaluate our approach on an existing causal benchmark, Corr2Cause.
arXiv Detail & Related papers (2024-12-18T15:32:27Z) - Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.
A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.
We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z) - Best Linear Unbiased Estimate from Privatized Contingency Tables [6.17477133700348]
In differential privacy (DP) mechanisms, some quantities can be estimated in multiple ways by combining different privatized values.<n>The DP 2020 Decennial Census products published by the U.S. Census Bureau consist of such redundant noisy counts.<n>We show that the minimum variance processing is a linear projection.
arXiv Detail & Related papers (2024-09-06T16:27:34Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - LaSagnA: Language-based Segmentation Assistant for Complex Queries [39.620806493454616]
Large Language Models for Vision (vLLMs) generate detailed perceptual outcomes, including bounding boxes and masks.
In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries.
We present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format.
arXiv Detail & Related papers (2024-04-12T14:40:45Z) - Synthesizing Tight Privacy and Accuracy Bounds via Weighted Model Counting [5.552645730505715]
Two core challenges are finding expressive, compact, and efficient encodings of distributions of DP algorithms.
We address the first challenge by developing a method for tight privacy and accuracy bound synthesis.
We develop a framework for leveraging inherent symmetries in DP algorithms.
arXiv Detail & Related papers (2024-02-26T19:29:46Z) - Query-Efficient Correlation Clustering with Noisy Oracle [17.11782578276788]
We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB)
We design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation and study their theoretical guarantees.
Our results are the first examples of clustering-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
arXiv Detail & Related papers (2024-02-02T13:31:24Z) - Rethinking Model Selection and Decoding for Keyphrase Generation with
Pre-trained Sequence-to-Sequence Models [76.52997424694767]
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications.
Seq2seq pre-trained language models (PLMs) have ushered in a transformative era for KPG, yielding promising performance improvements.
This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG.
arXiv Detail & Related papers (2023-10-10T07:34:45Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Contrastive Proposal Extension with LSTM Network for Weakly Supervised
Object Detection [52.86681130880647]
Weakly supervised object detection (WSOD) has attracted more and more attention since it only uses image-level labels and can save huge annotation costs.
We propose a new method by comparing the initial proposals and the extension ones to optimize those initial proposals.
Experiments on PASCAL VOC 2007, VOC 2012 and MS-COCO datasets show that our method has achieved the state-of-the-art results.
arXiv Detail & Related papers (2021-10-14T16:31:57Z) - Exploiting Database Management Systems and Treewidth for Counting [22.315022989618594]
A canonical counting problem is #SAT, which asks to count the assignments of a Boolean formula.
Recent work shows that benchmarking instances for #SAT often have reasonably small treewidth.
We introduce a general framework to solve counting questions based on state-of-the-art database management systems.
arXiv Detail & Related papers (2020-01-13T12:45:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.