"What makes my queries slow?": Subgroup Discovery for SQL Workload
Analysis
- URL: http://arxiv.org/abs/2108.03906v1
- Date: Mon, 9 Aug 2021 09:44:13 GMT
- Title: "What makes my queries slow?": Subgroup Discovery for SQL Workload
Analysis
- Authors: Youcef Remil, Anes Bendimerad, Romain Mathonat, Philippe Chaleat,
Mehdi Kaytoue
- Abstract summary: We introduce an original approach rooted on Subgroup Discovery.
We show how to instantiate and develop this generic data-mining framework.
We also provide a visualization tool for interactive knowledge discovery.
- Score: 1.3124513975412255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Among daily tasks of database administrators (DBAs), the analysis of query
workloads to identify schema issues and improving performances is crucial.
Although DBAs can easily pinpoint queries repeatedly causing performance
issues, it remains challenging to automatically identify subsets of queries
that share some properties only (a pattern) and simultaneously foster some
target measures, such as execution time. Patterns are defined on combinations
of query clauses, environment variables, database alerts and metrics and help
answer questions like what makes SQL queries slow? What makes I/O
communications high? Automatically discovering these patterns in a huge search
space and providing them as hypotheses for helping to localize issues and
root-causes is important in the context of explainable AI. To tackle it, we
introduce an original approach rooted on Subgroup Discovery. We show how to
instantiate and develop this generic data-mining framework to identify
potential causes of SQL workloads issues. We believe that such data-mining
technique is not trivial to apply for DBAs. As such, we also provide a
visualization tool for interactive knowledge discovery. We analyse a one week
workload from hundreds of databases from our company, make both the dataset and
source code available, and experimentally show that insightful hypotheses can
be discovered.
Related papers
- PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks.
In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z) - Text2SQL is Not Enough: Unifying AI and Databases with TAG [47.45480855418987]
Table-Augmented Generation (TAG) is a paradigm for answering natural language questions over databases.
We develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly.
arXiv Detail & Related papers (2024-08-27T00:50:14Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables [4.058220332950672]
Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development.
We propose FEATAUG, a new feature augmentation framework that automatically extracts predicate-aware queries from one-to-many relationship tables.
Our experiments on four real-world datasets demonstrate that FeatAug extracts more effective features compared to Featuretools.
arXiv Detail & Related papers (2024-03-11T01:44:14Z) - Testing Database Engines via Query Plan Guidance [6.789710498230718]
We propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases.
We apply our method to three mature, widely-used, and diverse database systems-DBite, TiDB, and Cockroach-and found 53 unique, previously unknown bugs.
arXiv Detail & Related papers (2023-12-29T08:09:47Z) - Searching for Better Database Queries in the Outputs of Semantic Parsers [16.221439565760058]
In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries.
The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests.
We apply our approach to the state-of-the-art semantics and report that it allows us to find many queries passing all the tests on different datasets.
arXiv Detail & Related papers (2022-10-13T17:20:45Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - A Benchmark for Generalizable and Interpretable Temporal Question
Answering over Knowledge Bases [67.33560134350427]
TempQA-WD is a benchmark dataset for temporal reasoning.
It is based on Wikidata, which is the most frequently curated, openly available knowledge base.
arXiv Detail & Related papers (2022-01-15T08:49:09Z) - Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open
Domain Question Answering [78.9863753810787]
A large amount of world's knowledge is stored in structured databases.
query languages can answer questions that require complex reasoning, as well as offering full explainability.
arXiv Detail & Related papers (2021-08-05T22:04:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.