Finding Logic Bugs in Spatial Database Engines via Affine Equivalent Inputs
- URL: http://arxiv.org/abs/2410.12496v2
- Date: Thu, 17 Oct 2024 20:23:09 GMT
- Title: Finding Logic Bugs in Spatial Database Engines via Affine Equivalent Inputs
- Authors: Wenjing Deng, Qiuyang Mang, Chengyu Zhang, Manuel Rigger,
- Abstract summary: Spatial Database Management Systems (SDBMSs) aim to store, manipulate, and retrieve spatial data.
The presence of logic bugs in SDBMSs can lead to incorrect results.
Detecting logic bugs in SDBMSs is challenging due to the lack of ground truth for identifying incorrect results.
- Score: 6.291508085458252
- License:
- Abstract: Spatial Database Management Systems (SDBMSs) aim to store, manipulate, and retrieve spatial data. SDBMSs are employed in various modern applications, such as geographic information systems, computer-aided design tools, and location-based services. However, the presence of logic bugs in SDBMSs can lead to incorrect results, substantially undermining the reliability of these applications. Detecting logic bugs in SDBMSs is challenging due to the lack of ground truth for identifying incorrect results. In this paper, we propose an automated geometry-aware generator to generate high-quality SQL statements for SDBMSs and a novel concept named Affine Equivalent Inputs (AEI) to validate the results of SDBMSs. We implemented them as a tool named Spatter (Spatial DBMSs Tester) for finding logic bugs in four popular SDBMSs: PostGIS, DuckDB Spatial, MySQL, and SQL Server. Our testing campaign detected 34 previously unknown and unique bugs in these SDBMS, of which 30 have been confirmed, and 18 have been already fixed. Our testing efforts have been well appreciated by the developers. Experimental results demonstrate that the geometry-aware generator significantly outperforms a naive random-shape generator in detecting unique bugs, and AEI can identify 14 logic bugs in SDBMSs that were overlooked by previous methodologies.
Related papers
- Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios [28.55596803781757]
Database mismatches are more prevalent in real-world scenarios.
We introduce Spider-Mismatch, a new dataset constructed to reflect the condition mismatch problems encountered in real-world scenarios.
Our method achieves the highest performance on the averaged results of the Spider and Spider-Realistic datasets in few-shot settings.
arXiv Detail & Related papers (2024-08-30T03:38:37Z) - SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing [17.421408394486072]
Database Management Systems (DBMSs) are vital components in modern data-driven systems.
Their complexity often leads to logic bugs, which can lead to incorrect query results, data exposure, unauthorized access, etc.
Existing detection employs two strategies: rule-based bug detection and coverage-guided fuzzing.
arXiv Detail & Related papers (2024-07-05T06:56:33Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Testing Database Engines via Query Plan Guidance [6.789710498230718]
We propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases.
We apply our method to three mature, widely-used, and diverse database systems-DBite, TiDB, and Cockroach-and found 53 unique, previously unknown bugs.
arXiv Detail & Related papers (2023-12-29T08:09:47Z) - Detecting DBMS Bugs with Context-Sensitive Instantiation and Multi-Plan Execution [11.18715154222032]
This paper aims to solve the two challenges, including how to generate semantically correctsql queries in a test case, and how to propose effective oracles to capture logic bugs.
We have implemented a prototype system called Kangaroo and applied three widely used and well-tested semantic codes.
The comparison between our system with the state-of-the-art systems shows that our system outperforms them in terms of the number of generated semantically valid queries, the explored code paths during testing, and the detected bugs.
arXiv Detail & Related papers (2023-12-08T10:15:56Z) - A Comparative Study of Text Embedding Models for Semantic Text
Similarity in Bug Reports [0.0]
Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs.
We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA.
Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task.
arXiv Detail & Related papers (2023-08-17T21:36:56Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL
Robustness [115.66421993459663]
Recent studies reveal that text-to- models are vulnerable to task-specific perturbations.
We propose a comprehensive robustness benchmark based on Spider to diagnose the model.
We conduct a diagnostic study of the state-of-the-art models on the set.
arXiv Detail & Related papers (2023-01-21T03:57:18Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.