Learning from Data to Speed-up Sorted Table Search Procedures:
Methodology and Practical Guidelines
- URL: http://arxiv.org/abs/2007.10237v3
- Date: Thu, 30 Jul 2020 11:56:44 GMT
- Title: Learning from Data to Speed-up Sorted Table Search Procedures:
Methodology and Practical Guidelines
- Authors: Domenico Amato, Giosu\'e Lo Bosco, Raffaele Giancarlo
- Abstract summary: We study to what extend Machine Learning Techniques can contribute to obtain such a speed-up.
We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing.
Indeed, we formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sorted Table Search Procedures are the quintessential query-answering tool,
with widespread usage that now includes also Web Applications, e.g, Search
Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at
very little cost in space, is still a quite significant achievement. Here we
study to what extend Machine Learning Techniques can contribute to obtain such
a speed-up via a systematic experimental comparison of known efficient
implementations of Sorted Table Search procedures, with different Data Layouts,
and their Learned counterparts developed here. We characterize the scenarios in
which those latter can be profitably used with respect to the former,
accounting for both CPU and GPU computing. Our approach contributes also to the
study of Learned Data Structures, a recent proposal to improve the time/space
performance of fundamental Data Structures, e.g., B-trees, Hash Tables, Bloom
Filters. Indeed, we also formalize an Algorithmic Paradigm of Learned
Dichotomic Sorted Table Search procedures that naturally complements the
Learned one proposed here and that characterizes most of the known Sorted Table
Search Procedures as having a "learning phase" that approximates Simple Linear
Regression.
Related papers
- Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - From Specific to Generic Learned Sorted Set Dictionaries: A
Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in
Practice [0.0]
We focus on Learned Sorted Set Dictionaries.
We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary.
arXiv Detail & Related papers (2023-09-02T13:52:53Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Standard Vs Uniform Binary Search and Their Variants in Learned Static
Indexing: The Case of the Searching on Sorted Data Benchmarking Software
Platform [0.0]
We show that for Learned, and as far as the bf SOSD software is concerned, the use of the Standard routine is superior to the Uniform one.
Our experiments also indicate that Uniform Binary and k-ary Search can be advantageous to use in order to save space in Learned.
arXiv Detail & Related papers (2022-01-05T11:46:16Z) - Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner
with Evolution-Based Optimization [49.207538634692916]
We propose a new algorithm for circuit routing, named Ranking Cost, to form an efficient and trainable router.
In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths.
Our algorithm is trained in an end-to-end manner and does not use any artificial data or human demonstration.
arXiv Detail & Related papers (2021-10-08T07:22:45Z) - Learned Sorted Table Search and Static Indexes in Small Space:
Methodological and Practical Insights via an Experimental Study [0.0]
Sorted Table Search Procedures are the quintessential query-answering tool, still very useful.
Speeding them up, in small additional space with respect to the table being searched into, is still a quite significant achievement.
To what extent one can enjoy the speed-up of Learned while using constant or nearly constant additional space is a major question.
arXiv Detail & Related papers (2021-07-19T16:06:55Z) - ReliefE: Feature Ranking in High-dimensional Spaces via Manifold
Embeddings [0.0]
Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances.
Recent embedding-based methods learn compact, low-dimensional representations.
ReliefE algorithm is faster and can result in better feature rankings.
arXiv Detail & Related papers (2021-01-23T20:23:31Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.