Related papers: Budget-aware Query Tuning: An AutoML Perspective

Budget-aware Query Tuning: An AutoML Perspective

URL: http://arxiv.org/abs/2404.00137v1
Date: Fri, 29 Mar 2024 20:19:36 GMT
Title: Budget-aware Query Tuning: An AutoML Perspective
Authors: Wentao Wu, Chi Wang,
Abstract summary: Modern database systems rely on cost-based querys to come up with good execution plans for input queries. We show that by varying the costunit values one can obtain query plans that significantly outperform the default query plans.
Score: 14.561951257365953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern database systems rely on cost-based query optimizers to come up with good execution plans for input queries. Such query optimizers rely on cost models to estimate the costs of candidate query execution plans. A cost model represents a function from a set of cost units to query execution cost, where each cost unit specifies the unit cost of executing a certain type of query processing operation (such as table scan or join). These cost units are traditionally viewed as constants, whose values only depend on the platform configuration where the database system runs on top of but are invariant for queries processed by the database system. In this paper, we challenge this classic view by thinking of these cost units as variables instead. We show that, by varying the cost-unit values one can obtain query plans that significantly outperform the default query plans returned by the query optimizer when viewing the cost units as constants. We term this cost-unit tuning process "query tuning" (QT) and show that it is similar to the well-known hyper-parameter optimization (HPO) problem in AutoML. As a result, any state-of-the-art HPO technologies can be applied to QT. We study the QT problem in the context of anytime tuning, which is desirable in practice by constraining the total time spent on QT within a given budget -- we call this problem budget-aware query tuning. We further extend our study from tuning a single query to tuning a workload with multiple queries, and we call this generalized problem budget-aware workload tuning (WT), which aims for minimizing the execution time of the entire workload. WT is more challenging as one needs to further prioritize individual query tuning within the given time budget. We propose solutions to both QT and WT and experimental evaluation using both benchmark and real workloads demonstrates the efficacy of our proposed solutions.

Related papers

Cost-aware Stopping for Bayesian Optimization [53.34052774820105]
We propose a cost-aware stopping rule for Bayesian optimization that adapts to varying evaluation costs and is free of tuning.<n>We prove a theoretical guarantee bounding the expected cumulative evaluation cost incurred by our stopping rule when paired with state-of-the-art acquisition functions.
arXiv Detail & Related papers (2025-07-16T17:54:14Z)
A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads [3.7960472831772765]
Lakehouse systems enable the same data to be queried with multiple execution engines.<n>We propose a cross-engine that can automate engine selection for diverse queries through a learned cost model.<n>We show that using a query optimized logical plan for cost estimation decreases the average Q-error by even 12.6% over using unoptimized plans as input.
arXiv Detail & Related papers (2025-06-03T12:32:56Z)
BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning [7.738546538164454]
A key issue in minimizing the overall makespan of data pipelines is the efficient scheduling of concurrent queries. To our knowledge, BQSched is the first non-intrusive batch query scheduler via reinforcement learning. Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling.
arXiv Detail & Related papers (2025-04-27T07:49:01Z)
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.60370366013142]
We propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms.
arXiv Detail & Related papers (2025-02-22T06:21:56Z)
Improving DBMS Scheduling Decisions with Fine-grained Performance Prediction on Concurrent Queries -- Extended [15.354441937462271]
This work introduces IconqSched, a new, principled non-intrusive scheduler that optimize execution order and timing of queries. IconqSched features a novel fine-grained predictor, Iconq, which treats the system runtime as a black box. We compare IconqSched to other schedulers in terms of end-to-end runtime using real workload traces.
arXiv Detail & Related papers (2025-01-27T17:55:39Z)
The Effect of Scheduling and Preemption on the Efficiency of LLM Inference Serving [8.552242818726347]
INFERMAX is an analytical framework that uses inference cost models to compare various schedulers. Our findings indicate that preempting requests can reduce GPU costs by 30% compared to avoiding preemptions at all.
arXiv Detail & Related papers (2024-11-12T00:10:34Z)
Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs. Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries. We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z)
PRICE: A Pretrained Model for Cross-Database Cardinality Estimation [78.30959470441442]
Cardinality estimation (CardEst) is essential for optimizing query execution plans. Recent ML-based CardEst methods achieve high accuracy but face deployment challenges due to high preparation costs. We propose PRICE, a PRetrained multI-table CardEst model, which addresses these limitations.
arXiv Detail & Related papers (2024-06-03T06:21:53Z)
Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model [3.0784574277021406]
We propose a holistic framework that enables robust query optimization based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization. We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-26T21:16:37Z)
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost. We present JoinGym, a query optimization environment for bushy reinforcement learning (RL) Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)
Conjunctive Query Based Constraint Solving For Feature Model Configuration [79.14348940034351]
We show how to apply conjunctive queries to solve constraint satisfaction problems. This approach allows the application of a wide-spread database technology to solve configuration tasks.
arXiv Detail & Related papers (2023-04-26T10:08:07Z)
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines [4.760079434948197]
We present novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries.
arXiv Detail & Related papers (2021-04-02T11:01:25Z)
Cost-Efficient Online Hyperparameter Optimization [94.60924644778558]
We propose an online HPO algorithm that reaches human expert-level performance within a single run of the experiment. Our proposed online HPO algorithm reaches human expert-level performance within a single run of the experiment, while incurring only modest computational overhead compared to regular training.
arXiv Detail & Related papers (2021-01-17T04:55:30Z)
A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration [17.75042918159419]
A cost-based algorithm is adopted in almost all current database systems. In the cost model, cardinality, the number of the numbers through an operator plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost, and the huge plan space model, the algorithm cannot find the optimal execution plan for a complex query in a reasonable time.
arXiv Detail & Related papers (2021-01-05T13:47:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.