Database Workload Characterization with Query Plan Encoders
- URL: http://arxiv.org/abs/2105.12287v1
- Date: Wed, 26 May 2021 01:17:27 GMT
- Title: Database Workload Characterization with Query Plan Encoders
- Authors: Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar
- Abstract summary: We propose our query plan encoders that learn essential features and their correlations from query plans.
Our pretrained encoders capture the em structural and the em computational performance of queries independently.
- Score: 32.941042348628606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Smart databases are adopting artificial intelligence (AI) technologies to
achieve {\em instance optimality}, and in the future, databases will come with
prepackaged AI models within their core components. The reason is that every
database runs on different workloads, demands specific resources, and settings
to achieve optimal performance. It prompts the necessity to understand
workloads running in the system along with their features comprehensively,
which we dub as workload characterization.
To address this workload characterization problem, we propose our query plan
encoders that learn essential features and their correlations from query plans.
Our pretrained encoders capture the {\em structural} and the {\em computational
performance} of queries independently. We show that our pretrained encoders are
adaptable to workloads that expedite the transfer learning process. We
performed independent assessments of structural encoder and performance
encoders with multiple downstream tasks. For the overall evaluation of our
query plan encoders, we architect two downstream tasks (i) query latency
prediction and (ii) query classification. These tasks show the importance of
feature-based workload characterization. We also performed extensive
experiments on individual encoders to verify the effectiveness of
representation learning and domain adaptability.
Related papers
- Sibyl: Forecasting Time-Evolving Query Workloads [9.16115447503004]
Database systems often rely on historical query traces to perform workload-based performance tuning.
Real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads.
We propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries.
arXiv Detail & Related papers (2024-01-08T08:11:32Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - A Unified Active Learning Framework for Annotating Graph Data with
Application to Software Source Code Performance Prediction [4.572330678291241]
We develop a unified active learning framework specializing in software performance prediction.
We investigate the impact of using different levels of information for active and passive learning.
Our approach aims to improve the investment in AI models for different software performance predictions.
arXiv Detail & Related papers (2023-04-06T14:00:48Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Interpretable by Design: Learning Predictors by Composing Interpretable
Queries [8.054701719767293]
We argue that machine learning algorithms should be interpretable by design.
We minimize the expected number of queries needed for accurate prediction.
Experiments on vision and NLP tasks demonstrate the efficacy of our approach.
arXiv Detail & Related papers (2022-07-03T02:40:34Z) - Self-Supervised Visual Representation Learning Using Lightweight
Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine.
We critically examine the most notable pretext tasks to extract features from image data.
We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.