Selecting Walk Schemes for Database Embedding
- URL: http://arxiv.org/abs/2401.11215v1
- Date: Sat, 20 Jan 2024 11:39:32 GMT
- Title: Selecting Walk Schemes for Database Embedding
- Authors: Yuval Lev Lubarsky, Jan T\"onshoff, Martin Grohe, Benny Kimelfeld
- Abstract summary: We study the embedding of components of a relational database.
We focus on the recent FoRWaRD algorithm that is designed for dynamic databases.
We show that by focusing on a few informative walk schemes, we can obtain embedding significantly faster, while retaining the quality.
- Score: 6.7609045625714925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machinery for data analysis often requires a numeric representation of the
input. Towards that, a common practice is to embed components of structured
data into a high-dimensional vector space. We study the embedding of the tuples
of a relational database, where existing techniques are often based on
optimization tasks over a collection of random walks from the database. The
focus of this paper is on the recent FoRWaRD algorithm that is designed for
dynamic databases, where walks are sampled by following foreign keys between
tuples. Importantly, different walks have different schemas, or "walk schemes",
that are derived by listing the relations and attributes along the walk. Also
importantly, different walk schemes describe relationships of different natures
in the database. We show that by focusing on a few informative walk schemes, we
can obtain tuple embedding significantly faster, while retaining the quality.
We define the problem of scheme selection for tuple embedding, devise several
approaches and strategies for scheme selection, and conduct a thorough
empirical study of the performance over a collection of downstream tasks. Our
results confirm that with effective strategies for scheme selection, we can
obtain high-quality embeddings considerably (e.g., three times) faster,
preserve the extensibility to newly inserted tuples, and even achieve an
increase in the precision of some tasks.
Related papers
- ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - A Markov Decision Process Approach to Active Meta Learning [24.50189361694407]
In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task.
In meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously.
arXiv Detail & Related papers (2020-09-10T15:45:34Z) - Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.