Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
- URL: http://arxiv.org/abs/2406.14545v1
- Date: Thu, 20 Jun 2024 17:54:33 GMT
- Title: Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
- Authors: Đorđe Klisura, Anthony Rios,
- Abstract summary: Our research extracts the database schema underlying a text-to-generative model.
We have developed a zero-knowledge framework designed to probe various database elements without knowledge of the database itself.
- Score: 7.613758211231583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Relational databases are integral to modern information systems, serving as the foundation for storing, querying, and managing data efficiently and effectively. Advancements in large language modeling have led to the emergence of text-to-SQL technologies, significantly enhancing the querying and extracting of information from these databases and raising concerns about privacy and security. Our research extracts the database schema elements underlying a text-to-SQL model. Knowledge of the schema can make attacks such as SQL injection easier. By asking specially crafted questions, we have developed a zero-knowledge framework designed to probe various database schema elements without knowledge of the database itself. The text-to-SQL models then process these questions to produce an output that we use to uncover the structure of the database schema. We apply it to specialized text-to-SQL models fine-tuned on text-SQL pairs and generative language models used for SQL generation. Overall, we can reconstruct the table names with an F1 of nearly .75 for fine-tuned models and .96 for generative.
Related papers
- RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [12.765849111313614]
We propose a novel framework called RSL- that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction.
Experiments on the BIRD and Spider benchmarks demonstrate that our approach achieves state-of-the-art execution accuracy among open-source solutions.
arXiv Detail & Related papers (2024-10-31T16:22:26Z) - Synthesizing Text-to-SQL Data from Weak and Strong LLMs [68.69270834311259]
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to- tasks.
We introduce a synthetic data approach that combines data produced by larger, more powerful models with error information data generated by smaller, not well-aligned models.
arXiv Detail & Related papers (2024-08-06T15:40:32Z) - TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring [11.78795632771211]
We introduce a novel benchmark designed to evaluate text-to- reliability as a model's ability to correctly handle any type of input question.
We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches.
arXiv Detail & Related papers (2024-03-23T16:12:52Z) - DBCopilot: Scaling Natural Language Querying to Massive Databases [47.009638761948466]
Existing methods face scalability challenges when dealing with massive, dynamically changing databases.
This paper introduces DBCopilot, a framework that employs a compact and flexible copilot model for routing across massive databases.
arXiv Detail & Related papers (2023-12-06T12:37:28Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - On the Security Vulnerabilities of Text-to-SQL Models [34.749129843281196]
We show that modules within six commercial applications can be manipulated to produce malicious code.
This is the first demonstration that NLP models can be exploited as attack vectors in the wild.
The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms.
arXiv Detail & Related papers (2022-11-28T14:38:45Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - UniSAr: A Unified Structure-Aware Autoregressive Language Model for
Text-to-SQL [48.21638676148253]
We present UniSAr (Unified Structure-Aware Autoregressive Language Model), which benefits from using an off-the-shelf language model.
Specifically, UniSAr extends existing autoregressive models to incorporate three non-invasive extensions to make them structure-aware.
arXiv Detail & Related papers (2022-03-15T11:02:55Z) - IGSQL: Database Schema Interaction Graph Based Neural Model for
Context-Dependent Text-to-SQL Generation [61.09660709356527]
We propose a database schema interaction graph encoder to utilize historicalal information of database schema items.
We evaluate our model on the benchmark SParC and Co datasets.
arXiv Detail & Related papers (2020-11-11T12:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.