SQL4NN: Validation and expressive querying of models as data
- URL: http://arxiv.org/abs/2502.14745v1
- Date: Thu, 20 Feb 2025 17:16:10 GMT
- Title: SQL4NN: Validation and expressive querying of models as data
- Authors: Mark Gerarts, Juno Steegmans, Jan Van den Bussche,
- Abstract summary: We consider machine learning models, learned from data, to be an important, intensional, kind of data in themselves.
Various analysis tasks on models can be thought of as queries over this intensional data, often combined with extensional data such as data for training or validation.
- Score: 0.5530212768657544
- License:
- Abstract: We consider machine learning models, learned from data, to be an important, intensional, kind of data in themselves. As such, various analysis tasks on models can be thought of as queries over this intensional data, often combined with extensional data such as data for training or validation. We demonstrate that relational database systems and SQL can actually be well suited for many such tasks.
Related papers
- The Duck's Brain: Training and Inference of Neural Networks in Modern
Database Engines [9.450046371705927]
We show how to transform data into a relational representation for training neural networks insql.
The evaluation in terms of runtime and memory consumption proves the suitability of modern database systems for matrix algebra.
arXiv Detail & Related papers (2023-12-28T20:45:06Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - A Unified Deep Model of Learning from both Data and Queries for
Cardinality Estimation [28.570086492742035]
We propose a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload.
UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.
arXiv Detail & Related papers (2021-07-26T16:09:58Z) - When Can Models Learn From Explanations? A Formal Framework for
Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance.
We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z) - Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model.
We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information.
Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z) - Have you forgotten? A method to assess if machine learning models have
forgotten data [20.9131206112401]
In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity.
In this paper, we want to address the challenging question of whether data have been forgotten by a model.
We establish statistical methods that compare the target's outputs with outputs of models trained with different datasets.
arXiv Detail & Related papers (2020-04-21T16:13:45Z) - Learning Over Dirty Data Without Cleaning [12.892359722606681]
Real-world datasets are dirty and contain many errors.
Learning over dirty databases may result in inaccurate models.
We propose DLearn, a novel relational learning system.
arXiv Detail & Related papers (2020-04-05T20:21:13Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.