Related papers: JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

URL: http://arxiv.org/abs/2307.11704v2
Date: Wed, 18 Oct 2023 03:25:08 GMT
Title: JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
Abstract summary: Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost. We present JoinGym, a query optimization environment for bushy reinforcement learning (RL) Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
Score: 58.71541261221863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization. In this paper, we present JoinGym, a lightweight and easy-to-use query optimization environment for reinforcement learning (RL) that captures both the left-deep and bushy variants of the JOS problem. Compared to existing query optimization environments, the key advantages of JoinGym are usability and significantly higher throughput which we accomplish by simulating query executions entirely offline. Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset. We release a novel cardinality dataset for $3300$ SQL queries based on real IMDb workloads which may be of independent interest, e.g., for cardinality estimation. Finally, we extensively benchmark four RL algorithms and find that their cost distributions are heavy-tailed, which motivates future work in risk-sensitive RL. In sum, JoinGym enables users to rapidly prototype RL algorithms on realistic database problems without needing to setup and run live systems.

Related papers

GenJoin: Conditional Generative Plan-to-Plan Query Optimizer that Learns from Subplan Hints [1.3108652488669732]
We present GenJoin, a novel learned query that considers the query optimization problem as a symbiotic generative task. GenJoin is the first learned query that significantly and consistently outperforms as well as state-of-the-art methods on two well-known real-world benchmarks.
arXiv Detail & Related papers (2024-11-07T08:31:01Z)
The Unreasonable Effectiveness of LLMs for Query Optimization [4.50924404547119]
We show that embeddings of query text contain useful semantic information for query optimization. We show that a simple binary deciding between alternative query plans, trained on a small number of embedded query vectors, can outperform existing systems.
arXiv Detail & Related papers (2024-11-05T07:10:00Z)
Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition. Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z)
Reinforcement Learning from Human Feedback with Active Queries [67.27150911254155]
Current reinforcement learning approaches often require a large amount of human-labelled preference data. We propose query-efficient RLHF methods, inspired by the success of active learning. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method.
arXiv Detail & Related papers (2024-02-14T18:58:40Z)
Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model [3.0784574277021406]
We propose a holistic framework that enables robust query optimization based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization. We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-26T21:16:37Z)
Kepler: Robust Learning for Faster Parametric Query Optimization [5.6119420695093245]
We propose an end-to-end learning-based approach to parametric query optimization. Kepler achieves significant improvements in query runtime on multiple datasets.
arXiv Detail & Related papers (2023-06-11T22:39:28Z)
Lero: A Learning-to-Rank Query Optimizer [49.841082217997354]
We introduce a learning to rank query, called Lero, which builds on top of the native query and continuously learns to improve query optimization. Rather than building a learned from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native. Lero achieves near optimal performance on several benchmarks.
arXiv Detail & Related papers (2023-02-14T07:31:11Z)
How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z)
Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations. We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.