Finding Cross-rule Optimization Bugs in Datalog Engines
- URL: http://arxiv.org/abs/2402.12863v1
- Date: Tue, 20 Feb 2024 09:54:52 GMT
- Title: Finding Cross-rule Optimization Bugs in Datalog Engines
- Authors: Chi Zhang, Linzhang Wang, Manuel Rigger
- Abstract summary: We propose an automated testing approach called Incremental Rule Evaluation (IRE)
IRE tackles the test oracle and test case generation problem.
We implement IRE as a tool named Deopt and evaluate it on four Datalog engines.
- Score: 8.849383195527627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Datalog is a popular and widely-used declarative logic programming language.
Datalog engines apply many cross-rule optimizations; bugs in them can cause
incorrect results. To detect such optimization bugs, we propose an automated
testing approach called Incremental Rule Evaluation (IRE), which
synergistically tackles the test oracle and test case generation problem. The
core idea behind the test oracle is to compare the results of an optimized
program and a program without cross-rule optimization; any difference indicates
a bug in the Datalog engine. Our core insight is that, for an optimized,
incrementally-generated Datalog program, we can evaluate all rules individually
by constructing a reference program to disable the optimizations that are
performed among multiple rules. Incrementally generating test cases not only
allows us to apply the test oracle for every new rule generated-we also can
ensure that every newly added rule generates a non-empty result with a given
probability and eschew recomputing already-known facts. We implemented IRE as a
tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely
Souffl\'e, CozoDB, $\mu$Z, and DDlog, and discovered a total of 30 bugs. Of
these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt
can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the
bugs identified by Deopt, queryFuzz might be unable to detect 5. Our
incremental test case generation approach is efficient; for example, for test
cases containing 60 rules, our incremental approach can produce 1.17$\times$
(for DDlog) to 31.02$\times$ (for Souffl\'e) as many valid test cases with
non-empty results as the naive random method. We believe that the simplicity
and the generality of the approach will lead to its wide adoption in practice.
Related papers
- SQLaser: Detecting DBMS Logic Bugs with Clause-Guided Fuzzing [17.421408394486072]
Database Management Systems (DBMSs) are vital components in modern data-driven systems.
Their complexity often leads to logic bugs, which can lead to incorrect query results, data exposure, unauthorized access, etc.
Existing detection employs two strategies: rule-based bug detection and coverage-guided fuzzing.
arXiv Detail & Related papers (2024-07-05T06:56:33Z) - Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types.
We show that even the most advanced LLMs fail to solve these problems end-to-end in text.
Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z) - Easy over Hard: A Simple Baseline for Test Failures Causes Prediction [13.759493107661834]
NCChecker is a tool to automatically identify the failure causes for failed test logs.
Our approach has three main stages: log abstraction, lookup table construction, and failure causes prediction.
We have developed a prototype and evaluated our tool on a real-world industrial dataset with more than 10K test logs.
arXiv Detail & Related papers (2024-05-05T12:59:37Z) - Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler [14.259471945857431]
We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains.
We propose a black-box generative approach that creates input programs for the K1 and K2 compilers.
Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
arXiv Detail & Related papers (2024-01-12T16:01:12Z) - FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization
Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations.
We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations.
To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - (1+1) Genetic Programming With Functionally Complete Instruction Sets
Can Evolve Boolean Conjunctions and Disjunctions with Arbitrarily Small Error [16.075339182916128]
GP systems can evolve a conjunction of $n$ variables if they are equipped with the minimal required components.
We show that a GP system can evolve the exact target function in $O(ell n log2 n)$ in expectation, where $ell geq is a limit on the size of any accepted tree.
arXiv Detail & Related papers (2023-03-13T20:24:21Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Revisiting Bayesian Optimization in the light of the COCO benchmark [1.4467794332678539]
This article reports a large investigation about the effects on the performance of (Gaussian process based) BO of common and less common design choices.
The code developed for this study makes the new version (v2.1.1) of the R package DiceOptim available on CRAN.
arXiv Detail & Related papers (2021-03-30T19:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.