BitE : Accelerating Learned Query Optimization in a Mixed-Workload
Environment
- URL: http://arxiv.org/abs/2306.00845v2
- Date: Fri, 2 Jun 2023 01:32:48 GMT
- Title: BitE : Accelerating Learned Query Optimization in a Mixed-Workload
Environment
- Authors: Yuri Kim, Yewon Choi, Yujung Gil, Sanghee Lee, Heesik Shin and Jaehyok
Chong
- Abstract summary: BitE is a novel ensemble learning model using database statistics and metadata to tune a learned query for enhancing performance.
Our model achieves 19.6% more improved queries and 15.8% less regressed queries compared to the existing traditional methods.
- Score: 0.36700088931938835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although the many efforts to apply deep reinforcement learning to query
optimization in recent years, there remains room for improvement as query
optimizers are complex entities that require hand-designed tuning of workloads
and datasets. Recent research present learned query optimizations results
mostly in bulks of single workloads which focus on picking up the unique traits
of the specific workload. This proves to be problematic in scenarios where the
different characteristics of multiple workloads and datasets are to be mixed
and learned together. Henceforth, in this paper, we propose BitE, a novel
ensemble learning model using database statistics and metadata to tune a
learned query optimizer for enhancing performance. On the way, we introduce
multiple revisions to solve several challenges: we extend the search space for
the optimal Abstract SQL Plan(represented as a JSON object called ASP) by
expanding hintsets, we steer the model away from the default plans that may be
biased by configuring the experience with all unique plans of queries, and we
deviate from the traditional loss functions and choose an alternative method to
cope with underestimation and overestimation of reward. Our model achieves
19.6% more improved queries and 15.8% less regressed queries compared to the
existing traditional methods whilst using a comparable level of resources.
Related papers
- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model [3.0784574277021406]
We propose a holistic framework that enables robust query optimization based on a risk-aware learning approach.
Roq includes a novel formalization of the notion of robustness in the context of query optimization.
We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-26T21:16:37Z) - FOSS: A Self-Learned Doctor for Query Optimizer [20.54782053709538]
Deep reinforcement learning (DRL) can be used to address the query optimization problem in database system.
We introduce FOSS, a novel DRL-based framework for query optimization.
We show that FOSS outperforms the state-of-the-art methods in terms of latency performance and optimization time.
arXiv Detail & Related papers (2023-12-11T13:05:51Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - Kepler: Robust Learning for Faster Parametric Query Optimization [5.6119420695093245]
We propose an end-to-end learning-based approach to parametric query optimization.
Kepler achieves significant improvements in query runtime on multiple datasets.
arXiv Detail & Related papers (2023-06-11T22:39:28Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Conditional Sequential Slate Optimization [15.10459152219771]
A search ranking system typically orders the results by independent query-document scores to produce a slate of search results.
We introduce conditional sequential slate optimization (CSSO), which jointly learns to optimize for traditional ranking metrics as well as prescribed distribution criteria of documents within the slate.
The proposed method can be applied to practical real world problems such as enforcing diversity in e-commerce search results, mitigating bias in top results and personalization of results.
arXiv Detail & Related papers (2021-08-12T09:14:46Z) - Learning to Select Base Classes for Few-shot Classification [96.92372639495551]
We use the Similarity Ratio as an indicator for the generalization performance of a few-shot model.
We then formulate the base class selection problem as a submodular optimization problem over Similarity Ratio.
arXiv Detail & Related papers (2020-04-01T09:55:18Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.