Related papers: CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution

CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution

URL: http://arxiv.org/abs/2412.05864v1
Date: Sun, 08 Dec 2024 09:11:11 GMT
Title: CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution
Authors: Rui Li, Kangfei Zhao, Jeffrey Xu Yu, Guoren Wang,
Abstract summary: CardOOD is a general learning framework designed to construct robust query-driven cardinality estimators.<n>We extend classical transfer/robust learning techniques to train query-driven cardinalityestimators.<n>We also propose a new learning algorithm that exploits the property of cardinality estimation.
Score: 36.98419661266168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the Out-of-distribution (OOD) problem, where the test workload distribution differs from the training workload, leading to performancedegradation. In this paper, we present CardOOD, a general learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we extend classical transfer/robust learning techniques to train query-driven cardinalityestimators, and the algorithms fall into three categories: representation learning, data manipulation, and new learning strategies. As these learning techniques are originally evaluated in computervision tasks, we also propose a new learning algorithm that exploits the property of cardinality estimation. This algorithm, lying in the category of new learning strategy, models the partial order constraint of cardinalities by a self-supervised learning task. Comprehensive experimental studies demonstrate the efficacy of the algorithms of CardOOD in mitigating the OOD problem to varying extents. We further integrate CardOOD into PostgreSQL, showcasing its practical utility in query optimization.

Related papers

Self-Evaluation for Job-Shop Scheduling [1.3927943269211593]
Combinatorial optimization problems, such as scheduling and route planning, are crucial in various industries but are computationally intractable due to their NP-hard nature. We propose a novel framework that generates and evaluates subsets of assignments, moving beyond traditional stepwise approaches.
arXiv Detail & Related papers (2025-02-12T11:22:33Z)
Learning Decision Trees and Forests with Algorithmic Recourse [11.401006371457436]
Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model. We formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible.
arXiv Detail & Related papers (2024-06-03T08:33:42Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning [29.80680408934347]
We propose an alternative framework to incremental learning where we continually fine-tune the model from a pre-trained representation. Our method takes advantage of linearization technique of a pre-trained neural network for simple and effective continual learning. We show that our method can be applied to general continual learning settings, we evaluate our method in data-incremental, task-incremental, and class-incremental learning problems.
arXiv Detail & Related papers (2022-08-17T06:58:14Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
An Improved Reinforcement Learning Algorithm for Learning to Branch [12.27934038849211]
Branch-and-bound (B&B) is a general and widely used method for optimization. In this paper, we propose a novel reinforcement learning-based B&B algorithm. We evaluate the performance of the proposed algorithm over three public research benchmarks.
arXiv Detail & Related papers (2022-01-17T04:50:11Z)
BALanCe: Deep Bayesian Active Learning via Equivalence Class Annealing [7.9107076476763885]
BALanCe is a deep active learning framework that mitigates the effect of uncertainty estimates. Batch-BALanCe is a generalization of the sequential algorithm to the batched setting. We show that Batch-BALanCe achieves state-of-the-art performance on several benchmark datasets for active learning.
arXiv Detail & Related papers (2021-12-27T15:38:27Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.