Related papers: From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice

From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice

URL: http://arxiv.org/abs/2309.00946v1
Date: Sat, 2 Sep 2023 13:52:53 GMT
Title: From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice
Authors: Domenico Amato, Giosu\'e Lo Bosco and Raffaele Giancarlo
Abstract summary: We focus on Learned Sorted Set Dictionaries. We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance of Table Search Procedures with a sorted layout only, e.g., Binary Search. We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary, for instance, Balanced Binary Search Trees or Binary Search on layouts other that sorted, i.e., Eytzinger. Theoretically, based on it, we obtain several results of interest, such as (a) the first Learned Optimum Binary Search Forest, with mean access time bounded by the Entropy of the probability distribution of the accesses to the Dictionary; (b) the first Learned Sorted Set Dictionary that, in the Dynamic Case and in an amortized analysis setting, matches the same time bounds known for Classic Dictionaries. This latter under widely accepted assumptions regarding the size of the Universe. The experimental part, somewhat complex in terms of software development, clearly indicates the nonobvious finding that the generalization we propose can yield effective and competitive Learned Data Structural Booster, even with respect to specific benchmark models.

Related papers

Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library. Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z)
Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner. Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval. We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z)
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit [18.089603786027503]
Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data. This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset.
arXiv Detail & Related papers (2023-12-16T21:43:07Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering [8.14208923345076]
The dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. We propose a generalized time warping invariant dictionary learning algorithm in this paper. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets.
arXiv Detail & Related papers (2023-06-30T14:18:13Z)
Clustering Semantic Predicates in the Open Research Knowledge Graph [0.0]
We describe our approach tailoring two AI-based clustering algorithms for recommending predicates about resources in the Open Research Knowledge Graph (ORKG) Our experiments show very promising results: a high precision with relatively high recall in linear runtime performance. This work offers novel insights into the predicate groups that automatically accrue loosely as generic semantification patterns for semantification of scholarly knowledge spanning 44 research fields.
arXiv Detail & Related papers (2022-10-05T05:48:39Z)
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression [94.28253749970534]
We propose to learn the rank concepts from the rich semantic CLIP latent space. OrdinalCLIP consists of learnable context tokens and learnable rank embeddings. Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks.
arXiv Detail & Related papers (2022-06-06T03:54:53Z)
Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform [0.0]
We show that for Learned, and as far as the bf SOSD software is concerned, the use of the Standard routine is superior to the Uniform one. Our experiments also indicate that Uniform Binary and k-ary Search can be advantageous to use in order to save space in Learned.
arXiv Detail & Related papers (2022-01-05T11:46:16Z)
Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines [0.0]
We study to what extend Machine Learning Techniques can contribute to obtain such a speed-up. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing. Indeed, we formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
arXiv Detail & Related papers (2020-07-20T16:26:54Z)
Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder. Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
Torch-Struct: Deep Structured Prediction Library [138.5262350501951]
We introduce Torch-Struct, a library for structured prediction. Torch-Struct includes a broad collection of probabilistic structures accessed through a simple and flexible distribution-based API.
arXiv Detail & Related papers (2020-02-03T16:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.