From Specific to Generic Learned Sorted Set Dictionaries: A
Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in
Practice
- URL: http://arxiv.org/abs/2309.00946v1
- Date: Sat, 2 Sep 2023 13:52:53 GMT
- Title: From Specific to Generic Learned Sorted Set Dictionaries: A
Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in
Practice
- Authors: Domenico Amato, Giosu\'e Lo Bosco and Raffaele Giancarlo
- Abstract summary: We focus on Learned Sorted Set Dictionaries.
We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research concerns Learned Data Structures, a recent area that has
emerged at the crossroad of Machine Learning and Classic Data Structures. It is
methodologically important and with a high practical impact. We focus on
Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available
so far are specific in the sense that they can boost, indeed impressively, the
time performance of Table Search Procedures with a sorted layout only, e.g.,
Binary Search. We propose a novel paradigm that, complementing known
specialized ones, can produce Learned versions of any Sorted Set Dictionary,
for instance, Balanced Binary Search Trees or Binary Search on layouts other
that sorted, i.e., Eytzinger. Theoretically, based on it, we obtain several
results of interest, such as (a) the first Learned Optimum Binary Search
Forest, with mean access time bounded by the Entropy of the probability
distribution of the accesses to the Dictionary; (b) the first Learned Sorted
Set Dictionary that, in the Dynamic Case and in an amortized analysis setting,
matches the same time bounds known for Classic Dictionaries. This latter under
widely accepted assumptions regarding the size of the Universe. The
experimental part, somewhat complex in terms of software development, clearly
indicates the nonobvious finding that the generalization we propose can yield
effective and competitive Learned Data Structural Booster, even with respect to
specific benchmark models.
Related papers
- Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Learning Interpretable Queries for Explainable Image Classification with
Information Pursuit [18.089603786027503]
Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data.
This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset.
arXiv Detail & Related papers (2023-12-16T21:43:07Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Generalized Time Warping Invariant Dictionary Learning for Time Series
Classification and Clustering [8.14208923345076]
The dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues.
We propose a generalized time warping invariant dictionary learning algorithm in this paper.
The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets.
arXiv Detail & Related papers (2023-06-30T14:18:13Z) - Clustering Semantic Predicates in the Open Research Knowledge Graph [0.0]
We describe our approach tailoring two AI-based clustering algorithms for recommending predicates about resources in the Open Research Knowledge Graph (ORKG)
Our experiments show very promising results: a high precision with relatively high recall in linear runtime performance.
This work offers novel insights into the predicate groups that automatically accrue loosely as generic semantification patterns for semantification of scholarly knowledge spanning 44 research fields.
arXiv Detail & Related papers (2022-10-05T05:48:39Z) - OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal
Regression [94.28253749970534]
We propose to learn the rank concepts from the rich semantic CLIP latent space.
OrdinalCLIP consists of learnable context tokens and learnable rank embeddings.
Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks.
arXiv Detail & Related papers (2022-06-06T03:54:53Z) - Standard Vs Uniform Binary Search and Their Variants in Learned Static
Indexing: The Case of the Searching on Sorted Data Benchmarking Software
Platform [0.0]
We show that for Learned, and as far as the bf SOSD software is concerned, the use of the Standard routine is superior to the Uniform one.
Our experiments also indicate that Uniform Binary and k-ary Search can be advantageous to use in order to save space in Learned.
arXiv Detail & Related papers (2022-01-05T11:46:16Z) - Learning from Data to Speed-up Sorted Table Search Procedures:
Methodology and Practical Guidelines [0.0]
We study to what extend Machine Learning Techniques can contribute to obtain such a speed-up.
We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing.
Indeed, we formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
arXiv Detail & Related papers (2020-07-20T16:26:54Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z) - Torch-Struct: Deep Structured Prediction Library [138.5262350501951]
We introduce Torch-Struct, a library for structured prediction.
Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API.
arXiv Detail & Related papers (2020-02-03T16:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.