Related papers: MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe

MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe

URL: http://arxiv.org/abs/2412.00749v1
Date: Sun, 01 Dec 2024 09:58:54 GMT
Title: MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe
Authors: Kaixin Zhang, Hongzhi Wang, Kunkai Gu, Ziqi Li, Chunyu Zhao, Yingze Li, Yu Yan,
Abstract summary: High-performance OLAP database technology has emerged with the growing demand for massive data analysis.<n>Many executors adopt sophisticated designs including SIMD operators, parallel execution, and dynamic pipeline modification.<n> MERLIN is a multi-stage query performance prediction method for high-performance OLAPs.
Score: 8.024724736461328
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-performance OLAP database technology has emerged with the growing demand for massive data analysis. To achieve much higher performance, many DBMSs adopt sophisticated designs including SIMD operators, parallel execution, and dynamic pipeline modification. However, such advanced OLAP query execution mechanisms still lack targeted Query Performance Prediction (QPP) methods because most existing methods target conventional tree-shaped query plans and static serial executors. To address this problem, in this paper, we proposed MERLIN a multi-stage query performance prediction method for high-performance OLAP DBMSs. MERLIN first establishes resource cost models for each physical operator. Then, it constructs a DAG that consists of a data-flow tree backbone and resource competition relationships among concurrent operators. After using a GAT with an extra attention mechanism to calibrate the cost, the cost vector tree is extracted and summarized by a TCN, ultimately enabling effective query performance prediction. Experimental results demonstrate that MERLIN yields higher performance prediction precision than existing methods.

Related papers

BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning [7.738546538164454]
A key issue in minimizing the overall makespan of data pipelines is the efficient scheduling of concurrent queries. To our knowledge, BQSched is the first non-intrusive batch query scheduler via reinforcement learning. Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling.
arXiv Detail & Related papers (2025-04-27T07:49:01Z)
Uncovering the Limitations of Query Performance Prediction: Failures, Insights, and Implications for Selective Query Processing [3.463527836552468]
This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC) We use diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO. Results show significant variability in predictors accuracy, with collections as the main factor and rankers next.
arXiv Detail & Related papers (2025-04-01T18:18:21Z)
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.60370366013142]
We propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents. Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms.
arXiv Detail & Related papers (2025-02-22T06:21:56Z)
Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Query Performance Explanation through Large Language Model for HTAP Systems [8.278943524339264]
In hybrid transactional and analytical processing systems, users often struggle to understand why query plans from one engine perform slower than those from another.<n>We propose a novel framework that leverages large language models (LLMs) to explain query performance in HTAP systems.
arXiv Detail & Related papers (2024-12-02T16:55:07Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Localized RETE for Incremental Graph Queries [1.3858051019755282]
We propose an extension semantics that enables local yet fully incremental execution graph queries. The proposed technique can significantly improve performance regarding memory consumption and execution time in favorable cases, but may incur a noticeable linear overhead unfavorable cases.
arXiv Detail & Related papers (2024-05-02T10:00:37Z)
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost. We present JoinGym, a query optimization environment for bushy reinforcement learning (RL) Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z)
Kepler: Robust Learning for Faster Parametric Query Optimization [5.6119420695093245]
We propose an end-to-end learning-based approach to parametric query optimization. Kepler achieves significant improvements in query runtime on multiple datasets.
arXiv Detail & Related papers (2023-06-11T22:39:28Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment [0.36700088931938835]
BitE is a novel ensemble learning model using database statistics and metadata to tune a learned query for enhancing performance. Our model achieves 19.6% more improved queries and 15.8% less regressed queries compared to the existing traditional methods.
arXiv Detail & Related papers (2023-06-01T16:05:33Z)
DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z)
Neural-Symbolic Entangled Framework for Complex Query Answering [22.663509971491138]
We propose a Neural and Entangled framework (ENeSy) for complex query answering. It enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. ENeSy achieves the SOTA performance on several benchmarks, especially in the setting of the training model only with the link prediction task.
arXiv Detail & Related papers (2022-09-19T06:07:10Z)
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner [56.08919422452905]
We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR) Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. We outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
arXiv Detail & Related papers (2022-05-18T21:52:11Z)
Multi-task Learning of Order-Consistent Causal Graphs [59.9575145128345]
We consider the problem of discovering $K related Gaussian acyclic graphs (DAGs) Under multi-task learning setting, we propose a $l_1/l$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order.
arXiv Detail & Related papers (2021-11-03T22:10:18Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.