CERT: Finding Performance Issues in Database Systems Through the Lens of
Cardinality Estimation
- URL: http://arxiv.org/abs/2306.00355v3
- Date: Wed, 10 Jan 2024 02:56:41 GMT
- Title: CERT: Finding Performance Issues in Database Systems Through the Lens of
Cardinality Estimation
- Authors: Jinsheng Ba, Manuel Rigger
- Abstract summary: We propose Cardinality Restriction Testing (CERT), a technique that finds performance issues through the lens of cardinality estimation.
CERT tests cardinality estimation specifically, because they were shown to be the most important part for query optimization.
- Score: 6.789710498230718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Database Management Systems (DBMSs) process a given query by creating a query
plan, which is subsequently executed, to compute the query's result. Deriving
an efficient query plan is challenging, and both academia and industry have
invested decades into researching query optimization. Despite this, DBMSs are
prone to performance issues, where a DBMS produces an unexpectedly inefficient
query plan that might lead to the slow execution of a query. Finding such
issues is a longstanding problem and inherently difficult, because no ground
truth information on an expected execution time exists. In this work, we
propose Cardinality Estimation Restriction Testing (CERT), a novel technique
that finds performance issues through the lens of cardinality estimation. Given
a query on a database, CERT derives a more restrictive query (e.g., by
replacing a LEFT JOIN with an INNER JOIN), whose estimated number of rows
should not exceed the estimated number of rows for the original query. CERT
tests cardinality estimation specifically, because they were shown to be the
most important part for query optimization; thus, we expect that finding and
fixing such issues might result in the highest performance gains. In addition,
we found that other kinds of query optimization issues can be exposed by
unexpected estimated cardinalities, which can also be found by CERT. CERT is a
black-box technique that does not require access to the source code; DBMSs
expose query plans via the EXPLAIN statement. CERT eschews executing queries,
which is costly and prone to performance fluctuations. We evaluated CERT on
three widely used and mature DBMSs, MySQL, TiDB, and CockroachDB. CERT found 13
unique issues, of which 2 issues were fixed and 9 confirmed by the developers.
We expect that this new angle on finding performance bugs will help DBMS
developers in improving DMBSs' performance.
Related papers
- E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [1.187832944550453]
We introduce E- repository, a novel pipeline designed to address challenges through direct schema linking and candidate predicate augmentation.
E- enhances the natural language query by incorporating relevant database items (i.e. tables, columns, and values) and conditions directly into the question, bridging the gap between the query and the database structure.
We investigate the impact of schema filtering, a technique widely explored in previous work, and demonstrate its diminishing returns when applied alongside advanced large language models.
arXiv Detail & Related papers (2024-09-25T09:02:48Z) - DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing.
We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - The Surprising Effectiveness of Rankers Trained on Expanded Queries [4.874071145951159]
We improve the ranking performance of hard or difficult queries without compromising the performance of other queries.
We combine relevance scores from the specialized ranker and the base ranker, along with a query performance score estimated for each query.
In our experiments on the DL-Hard dataset, we find that a principled query performance based scoring method offers a significant improvement of up to 25% on the passage ranking task.
arXiv Detail & Related papers (2024-04-03T09:12:22Z) - Hydro: Adaptive Query Processing of ML Queries [7.317548344184541]
We present Hydro, an adaptive query processing (AQP) for efficiently processing machine learning (ML) queries.
We demonstrate Hydro's efficacy through four illustrative use cases, delivering up to 11.52x speedup over a baseline system.
arXiv Detail & Related papers (2024-03-22T01:17:07Z) - Testing Database Engines via Query Plan Guidance [6.789710498230718]
We propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases.
We apply our method to three mature, widely-used, and diverse database systems-DBite, TiDB, and Cockroach-and found 53 unique, previously unknown bugs.
arXiv Detail & Related papers (2023-12-29T08:09:47Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs)
With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses.
With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z) - Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic
Parsing [110.97778888305506]
BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question.
BRIDGE attained state-of-the-art performance on popular cross-DB text-to- relational benchmarks.
Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks.
arXiv Detail & Related papers (2020-12-23T12:33:52Z) - DC-BERT: Decoupling Question and Document for Efficient Contextual
Encoding [90.85913515409275]
Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT.
We propose DC-BERT, a contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings.
On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance.
arXiv Detail & Related papers (2020-02-28T08:18:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.