Parser Knows Best: Testing DBMS with Coverage-Guided Grammar-Rule Traversal
- URL: http://arxiv.org/abs/2503.03893v2
- Date: Fri, 07 Mar 2025 22:43:46 GMT
- Title: Parser Knows Best: Testing DBMS with Coverage-Guided Grammar-Rule Traversal
- Authors: Yu Liang, Hong Hu,
- Abstract summary: We propose Fuzz, a novel fuzzing framework that automatically extracts grammar rules from built-in syntaxs' built-in definition files forsql generation.<n>Fuzz can generate diverse query statements to saturate the grammar features of the testeds, which grammar features could be missed by previous tools.<n>In our evaluation, Fuzz outperforms all state-of-the-art existing testing tools in terms of bug finding, grammar rule coverage and code coverage.
- Score: 6.300885279363564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Database Management System (DBMS) is the key component for data-intensive applications. Recently, researchers propose many tools to comprehensively test DBMS systems for finding various bugs. However, these tools only cover a small subset of diverse syntax elements defined in DBMS-specific SQL dialects, leaving a large number of features unexplored. In this paper, we propose ParserFuzz, a novel fuzzing framework that automatically extracts grammar rules from DBMSs' built-in syntax definition files for SQL query generation. Without any input corpus, ParserFuzz can generate diverse query statements to saturate the grammar features of the tested DBMSs, which grammar features could be missed by previous tools. Additionally, ParserFuzz utilizes code coverage as feedback to guide the query mutation, which combines different DBMS features extracted from the syntax rules to find more function and safety bugs. In our evaluation, ParserFuzz outperforms all state-of-the-art existing DBMS testing tools in terms of bug finding, grammar rule coverage and code coverage. ParserFuzz detects 81 previously unknown bugs in total across 5 popular DBMSs, where all bugs are confirmed and 34 have been fixed.
Related papers
- Scaling Automated Database System Testing [3.3302293148249125]
We present a vision and a platform to apply test oracles to any database that supports a subset of commonsql features.
In this work, we present both a vision and a platform, SQLancer++, to apply test oracles to any database that supports a subset of commonsql features.
arXiv Detail & Related papers (2025-03-27T12:10:36Z) - Constant Optimization Driven Database System Testing [6.246028398098516]
Logic bugs are bugs that can cause database management systems (DBMSs) to silently produce incorrect results for given queries.<n>We propose Constant-Optimization-Driven Database Testing (CODDTest) as a novel approach for detecting logic bugs in databases.
arXiv Detail & Related papers (2025-01-20T03:32:55Z) - Finding Logic Bugs in Spatial Database Engines via Affine Equivalent Inputs [6.291508085458252]
Spatial Database Management Systems (SDBMSs) aim to store, manipulate, and retrieve spatial data.
The presence of logic bugs in SDBMSs can lead to incorrect results.
Detecting logic bugs in SDBMSs is challenging due to the lack of ground truth for identifying incorrect results.
arXiv Detail & Related papers (2024-10-16T12:18:16Z) - SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing [17.421408394486072]
Database Management Systems (DBMSs) are vital components in modern data-driven systems.
Their complexity often leads to logic bugs, which can lead to incorrect query results, data exposure, unauthorized access, etc.
Existing detection employs two strategies: rule-based bug detection and coverage-guided fuzzing.
arXiv Detail & Related papers (2024-07-05T06:56:33Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - Testing Database Engines via Query Plan Guidance [6.789710498230718]
We propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases.
We apply our method to three mature, widely-used, and diverse database systems-DBite, TiDB, and Cockroach-and found 53 unique, previously unknown bugs.
arXiv Detail & Related papers (2023-12-29T08:09:47Z) - Detecting DBMS Bugs with Context-Sensitive Instantiation and Multi-Plan Execution [11.18715154222032]
This paper aims to solve the two challenges, including how to generate semantically correctsql queries in a test case, and how to propose effective oracles to capture logic bugs.
We have implemented a prototype system called Kangaroo and applied three widely used and well-tested semantic codes.
The comparison between our system with the state-of-the-art systems shows that our system outperforms them in terms of the number of generated semantically valid queries, the explored code paths during testing, and the detected bugs.
arXiv Detail & Related papers (2023-12-08T10:15:56Z) - UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems.
It is composed of publicly available text-to-domain datasets and 29K databases.
Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z) - Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge
Base and Database [86.03294330305097]
We propose a unified semantic element for question answering (QA) on both knowledge bases (KB) and databases (DB)
We introduce the primitive (relation and entity in KB, table name, column name and cell value in DB) as an essential element in our framework.
We leverage the generator to predict final logical forms by altering and composing topranked primitives with different operations.
arXiv Detail & Related papers (2022-11-09T19:33:27Z) - Towards Generalizable and Robust Text-to-SQL Parsing [77.18724939989647]
We propose a novel TKK framework consisting of Task decomposition, Knowledge acquisition, and Knowledge composition to learn text-to- parsing in stages.
We show that our framework is effective in all scenarios and state-of-the-art performance on the Spider, SParC, and Co. datasets.
arXiv Detail & Related papers (2022-10-23T09:21:27Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.