A Unified Deep Model of Learning from both Data and Queries for
Cardinality Estimation
- URL: http://arxiv.org/abs/2107.12295v1
- Date: Mon, 26 Jul 2021 16:09:58 GMT
- Title: A Unified Deep Model of Learning from both Data and Queries for
Cardinality Estimation
- Authors: Peizhi Wu and Gao Cong
- Abstract summary: We propose a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload.
UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.
- Score: 28.570086492742035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cardinality estimation is a fundamental problem in database systems. To
capture the rich joint data distributions of a relational table, most of the
existing work either uses data as unsupervised information or uses query
workload as supervised information. Very little work has been done to use both
types of information, and cannot fully make use of both types of information to
learn the joint data distribution. In this work, we aim to close the gap
between data-driven and query-driven methods by proposing a new unified deep
autoregressive model, UAE, that learns the joint data distribution from both
the data and query workload. First, to enable using the supervised query
information in the deep autoregressive model, we develop differentiable
progressive sampling using the Gumbel-Softmax trick. Second, UAE is able to
utilize both types of information to learn the joint data distribution in a
single model. Comprehensive experimental results demonstrate that UAE achieves
single-digit multiplicative error at tail, better accuracies over
state-of-the-art methods, and is both space and time efficient.
Related papers
- Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - ASPER: Answer Set Programming Enhanced Neural Network Models for Joint
Entity-Relation Extraction [11.049915720093242]
This paper proposes a new approach, ASP-enhanced Entity-Relation extraction (ASPER)
ASPER jointly recognizes entities and relations by learning from both data and domain knowledge.
In particular, ASPER takes advantage of the factual knowledge (represented as facts in ASP) and derived knowledge (represented as rules in ASP) in the learning process of neural network models.
arXiv Detail & Related papers (2023-05-24T17:32:58Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Multi-Center Federated Learning [62.57229809407692]
This paper proposes a novel multi-center aggregation mechanism for federated learning.
It learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers.
Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.
arXiv Detail & Related papers (2020-05-03T09:14:31Z) - Have you forgotten? A method to assess if machine learning models have
forgotten data [20.9131206112401]
In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity.
In this paper, we want to address the challenging question of whether data have been forgotten by a model.
We establish statistical methods that compare the target's outputs with outputs of models trained with different datasets.
arXiv Detail & Related papers (2020-04-21T16:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.