An Empirical Evaluation of Cost-based Federated SPARQL Query Processing
Engines
- URL: http://arxiv.org/abs/2104.00984v1
- Date: Fri, 2 Apr 2021 11:01:25 GMT
- Title: An Empirical Evaluation of Cost-based Federated SPARQL Query Processing
Engines
- Authors: Umair Qudus, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Young-koo Lee
- Abstract summary: We present novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines.
We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries.
- Score: 4.760079434948197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding a good query plan is key to the optimization of query runtime. This
holds in particular for cost-based federation engines, which make use of
cardinality estimations to achieve this goal. A number of studies compare
SPARQL federation engines across different performance metrics, including query
runtime, result set completeness and correctness, number of sources selected
and number of requests sent. Albeit informative, these metrics are generic and
unable to quantify and evaluate the accuracy of the cardinality estimators of
cost-based federation engines. To thoroughly evaluate cost-based federation
engines, the effect of estimated cardinality errors on the overall query
runtime performance must be measured. In this paper, we address this challenge
by presenting novel evaluation metrics targeted at a fine-grained benchmarking
of cost-based federated SPARQL query engines. We evaluate five cost-based
federated SPARQL query engines using existing as well as novel evaluation
metrics by using LargeRDFBench queries. Our results provide a detailed analysis
of the experimental outcomes that reveal novel insights, useful for the
development of future cost-based federated SPARQL query processing engines.
Related papers
- UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - NL2KQL: From Natural Language to Kusto Query [1.7931930942711818]
NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries.
To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
arXiv Detail & Related papers (2024-04-03T01:09:41Z) - Budget-aware Query Tuning: An AutoML Perspective [14.561951257365953]
Modern database systems rely on cost-based querys to come up with good execution plans for input queries.
We show that by varying the costunit values one can obtain query plans that significantly outperform the default query plans.
arXiv Detail & Related papers (2024-03-29T20:19:36Z) - Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model [3.0784574277021406]
We propose a holistic framework that enables robust query optimization based on a risk-aware learning approach.
Roq includes a novel formalization of the notion of robustness in the context of query optimization.
We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-26T21:16:37Z) - JoinGym: An Efficient Query Optimization Environment for Reinforcement
Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost.
We present JoinGym, a query optimization environment for bushy reinforcement learning (RL)
Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - NeuralSearchX: Serving a Multi-billion-parameter Reranker for
Multilingual Metasearch at a Low Cost [4.186775801993103]
We describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences.
We show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
arXiv Detail & Related papers (2022-10-26T16:36:53Z) - Learning GraphQL Query Costs (Extended Version) [7.899264246319001]
We propose a machine-learning approach to efficiently and accurately estimate the query cost.
Our framework is efficient and predicts query costs with high accuracy, consistently outperforming the static analysis by a large margin.
arXiv Detail & Related papers (2021-08-25T09:18:31Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Query Focused Multi-Document Summarization with Distant Supervision [88.39032981994535]
Existing work relies heavily on retrieval-style methods for estimating the relevance between queries and text segments.
We propose a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query.
We demonstrate that our framework outperforms strong comparison systems on standard QFS benchmarks.
arXiv Detail & Related papers (2020-04-06T22:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.