MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
- URL: http://arxiv.org/abs/2509.23459v2
- Date: Tue, 30 Sep 2025 17:43:21 GMT
- Title: MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
- Authors: Sepideh Abedini, Shubhankar Mohapatra, D. B. Emerson, Masoumeh Shafieinejad, Jesse C. Cresswell, Xi He,
- Abstract summary: Large language models (LLMs) have shown promising performance on tasks that require reasoning.<n>State-of-the-art LLMs are also proprietary, costly, and resource-intensive, making local deployment impractical.<n>We introduce Mask, a text-to-action framework that utilizes abstraction as a privacy protection mechanism.
- Score: 9.405530537180129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown promising performance on tasks that require reasoning, such as text-to-SQL, code generation, and debugging. However, regulatory frameworks with strict privacy requirements constrain their integration into sensitive systems. State-of-the-art LLMs are also proprietary, costly, and resource-intensive, making local deployment impractical. Consequently, utilizing such LLMs often requires sharing data with third-party providers, raising privacy concerns and risking noncompliance with regulations. Although fine-tuned small language models (SLMs) can outperform LLMs on certain tasks and be deployed locally to mitigate privacy concerns, they underperform on more complex tasks such as text-to-SQL translation. In this work, we introduce MaskSQL, a text-to-SQL framework that utilizes abstraction as a privacy protection mechanism to mask sensitive information in LLM prompts. Unlike redaction, which removes content entirely, or generalization, which broadens tokens, abstraction retains essential information while discarding unnecessary details, striking an effective privacy-utility balance for the text-to-SQL task. Moreover, by providing mechanisms to control the privacy-utility tradeoff, MaskSQL facilitates adoption across a broader range of use cases. Our experimental results show that MaskSQL outperforms leading SLM-based text-to-SQL models and achieves performance approaching state-of-the-art LLM-based models, while preserving privacy.
Related papers
- A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback [40.19592881059662]
Large Language Models (LLMs) have demonstrated superior performance for generating Text2sql queries.<n>Privacy and cost considerations prevent companies from using Text2 solutions based on external LLMs offered as a service.<n>We propose MATS, a novel Text2 framework designed specifically for SLMs.
arXiv Detail & Related papers (2025-12-21T06:43:47Z) - When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing [61.80513991207956]
This work focuses on the challenge of how to restore surrogate-driven protected data in diverse MLLM scenarios.<n>We first bridge this research gap by contributing the SPPE (Surrogate Privacy Protected Editable) dataset.<n>We introduce a unified approach that reliably reconstructs private content while preserving the fidelity of MLLM-generated edits.
arXiv Detail & Related papers (2025-12-08T04:59:03Z) - SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces [12.135290721799421]
We propose textscSafeNlidb, a novel privacy-security alignment framework for NLIDB.<n>The framework features an automated pipeline that generates hybrid chain-of-thought interaction data from scratch.<n>Our method outperforms larger-scale LLMs and ideal-setting baselines.
arXiv Detail & Related papers (2025-11-10T07:05:59Z) - The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - Agentic Privacy-Preserving Machine Learning [5.695349155812586]
Privacy-preserving machine learning (PPML) is critical to ensure data privacy in AI.<n>We propose a novel framework named Agentic-PPML to make PPML in LLMs practical.
arXiv Detail & Related papers (2025-07-30T08:20:45Z) - Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences [73.5779077857545]
We build a framework where a local model uses these instructions to rewrite queries, only hiding details deemed sensitive by the user, before sending them to an external model.<n>Experiments with lightweight local LLMs show that, after fine-tuning, they markedly exceed the performance of much larger zero-shot models.<n>At the same time, the system still faces challenges in fully adhering to user instructions, underscoring the need for models with a better understanding of user-defined privacy preferences.
arXiv Detail & Related papers (2025-07-07T18:22:55Z) - Enhancing LLMs with Smart Preprocessing for EHR Analysis [3.5839042822277585]
Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing.<n>This paper introduces a compact LLM framework optimized for local deployment in environments with stringent privacy requirements.
arXiv Detail & Related papers (2024-12-03T22:06:55Z) - Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks.
They can only incorporate new knowledge through training or supervised fine-tuning processes.
This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z) - FedCoT: Federated Chain-of-Thought Distillation for Large Language Models [24.624093188197126]
Large Language Models (LLMs) have emerged as a transformative force in artificial intelligence, demonstrating exceptional proficiency across various tasks.<n>Small Language Models (SLMs) offer computational efficiency but often lag in performance.<n>We propose FedCoT, a framework designed for the Chain-of-Thought (CoT) distillation of knowledge from LLMs to SLMs.
arXiv Detail & Related papers (2024-06-18T08:48:14Z) - Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data [53.70870879858533]
We introduce a Federated Domain-specific Knowledge Transfer framework.
It enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy.
The proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5% with a privacy budget of less than 10.
arXiv Detail & Related papers (2024-05-23T06:14:35Z) - SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data [12.873248205613827]
Traditional security mechanisms isolate resources from users who should not access them.
We reflect the compositional nature of such security mechanisms back into the structure of LLMs to build a provably secure LLM.
SecureLLM blends access security with fine-tuning methods.
We contribute both a difficult new compositional natural-language-to- translation task and a new perspective on LLM security that allows models to be deployed to secure environments today.
arXiv Detail & Related papers (2024-05-16T04:25:53Z) - UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models [12.45822383965784]
We introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method.
Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [76.76046657162306]
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
Large language models (LLMs) have emerged as a new paradigm for Text-to- task.
arXiv Detail & Related papers (2023-08-29T14:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.