Related papers: Sibyl: Forecasting Time-Evolving Query Workloads

Sibyl: Forecasting Time-Evolving Query Workloads

URL: http://arxiv.org/abs/2401.03723v1
Date: Mon, 8 Jan 2024 08:11:32 GMT
Title: Sibyl: Forecasting Time-Evolving Query Workloads
Authors: Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian
Abstract summary: Database systems often rely on historical query traces to perform workload-based performance tuning. Real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. We propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries.
Score: 9.16115447503004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively.

Related papers

Making Databases Faster with LLM Evolutionary Sampling [27.62392938968789]
Traditional query optimization relies on cost-based models that estimate execution cost.<n>We use our DBPlan harness for the DataFusion engine to propose localized edits that can be applied and executed.<n>We then apply an evolutionary search over these edits to refine candidates across iterations.<n>We obtain up to 4.78$times$ speedups on some queries and we demonstrate a small-to-large workflow.
arXiv Detail & Related papers (2026-02-11T00:21:51Z)
SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving [56.198745862311824]
We introduce SQS, a novel query-based splatting pre-training for sparse Perception Models (SPMs)<n> SQS predicts 3D Gaussian representations from sparse queries during pre-training, leveraging self-supervised splatting to learn fine-grained contextual features.<n>Experiments on autonomous driving benchmarks demonstrate that SQS delivers considerable performance gains across multiple query-based 3D perception tasks.
arXiv Detail & Related papers (2025-09-20T09:25:19Z)
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction [92.7392863957204]
FutureX is the largest and most diverse live benchmark for future prediction.<n>It supports real-time daily updates and eliminates data contamination through an automated pipeline for question gathering and answer collection.<n>We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools.
arXiv Detail & Related papers (2025-08-16T08:54:08Z)
Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding [56.565200973244146]
Agentic Predictor is a lightweight predictor for efficient agentic workflow evaluation.<n>By learning to approximate task success rates, Agentic Predictor enables fast and accurate selection of optimal agentic workflow configurations.
arXiv Detail & Related papers (2025-05-26T09:46:50Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations. Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z)
PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload [11.93843096959306]
workload of cloud servers is highly variable, with occasional heavy workload bursts. There are two categories of workload prediction methods: statistical methods and neural-network-based ones. We propose PePNet to improve overall especially heavy workload prediction accuracy.
arXiv Detail & Related papers (2023-07-11T07:56:27Z)
Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM) The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z)
Kepler: Robust Learning for Faster Parametric Query Optimization [5.6119420695093245]
We propose an end-to-end learning-based approach to parametric query optimization. Kepler achieves significant improvements in query runtime on multiple datasets.
arXiv Detail & Related papers (2023-06-11T22:39:28Z)
BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment [0.36700088931938835]
BitE is a novel ensemble learning model using database statistics and metadata to tune a learned query for enhancing performance. Our model achieves 19.6% more improved queries and 15.8% less regressed queries compared to the existing traditional methods.
arXiv Detail & Related papers (2023-06-01T16:05:33Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
Database Workload Characterization with Query Plan Encoders [32.941042348628606]
We propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the em structural and the em computational performance of queries independently.
arXiv Detail & Related papers (2021-05-26T01:17:27Z)
FES: A Fast Efficient Scalable QoS Prediction Framework [0.9176056742068814]
One of the primary objectives of designing a prediction algorithm is to achieve satisfactory prediction accuracy. The algorithm has to be faster in terms of prediction time so that it can be integrated into a real-time recommendation system. The existing algorithms on prediction often compromise on one goal while ensuring the others.
arXiv Detail & Related papers (2021-03-12T19:28:17Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)
Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.