Learned Sorted Table Search and Static Indexes in Small Space:
Methodological and Practical Insights via an Experimental Study
- URL: http://arxiv.org/abs/2107.09480v2
- Date: Wed, 21 Jul 2021 13:56:52 GMT
- Title: Learned Sorted Table Search and Static Indexes in Small Space:
Methodological and Practical Insights via an Experimental Study
- Authors: Domenico Amato and Raffaele Giancarlo and Giosu\`e Lo Bosco
- Abstract summary: Sorted Table Search Procedures are the quintessential query-answering tool, still very useful.
Speeding them up, in small additional space with respect to the table being searched into, is still a quite significant achievement.
To what extent one can enjoy the speed-up of Learned while using constant or nearly constant additional space is a major question.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sorted Table Search Procedures are the quintessential query-answering tool,
still very useful, e.g, Search Engines (Google Chrome). Speeding them up, in
small additional space with respect to the table being searched into, is still
a quite significant achievement. Static Learned Indexes have been very
successful in achieving such a speed-up, but leave open a major question: To
what extent one can enjoy the speed-up of Learned Indexes while using constant
or nearly constant additional space. By generalizing the experimental
methodology of a recent benchmarking study on Learned Indexes, we shed light on
this question, by considering two scenarios. The first, quite elementary, i.e.,
textbook code, and the second using advanced Learned Indexing algorithms and
the supporting sophisticated software platforms. Although in both cases one
would expect a positive answer, its achievement is not as simple as it seems.
Indeed, our extensive set of experiments reveal a complex relationship between
query time and model space. The findings regarding this relationship and the
corresponding quantitative estimates, across memory levels, can be of interest
to algorithm designers and of use to practitioners as well. As an essential
part of our research, we introduce two new models that are of interest in their
own right. The first is a constant space model that can be seen as a
generalization of $k$-ary search, while the second is a synoptic {\bf RMI}, in
which we can control model space usage.
Related papers
- Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Searching a High-Performance Feature Extractor for Text Recognition
Network [92.12492627169108]
We design a domain-specific search space by exploring principles for having good feature extractors.
As the space is huge and complexly structured, no existing NAS algorithms can be applied.
We propose a two-stage algorithm to effectively search in the space.
arXiv Detail & Related papers (2022-09-27T03:49:04Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Standard Vs Uniform Binary Search and Their Variants in Learned Static
Indexing: The Case of the Searching on Sorted Data Benchmarking Software
Platform [0.0]
We show that for Learned, and as far as the bf SOSD software is concerned, the use of the Standard routine is superior to the Uniform one.
Our experiments also indicate that Uniform Binary and k-ary Search can be advantageous to use in order to save space in Learned.
arXiv Detail & Related papers (2022-01-05T11:46:16Z) - Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections.
We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks.
In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Learning from Data to Speed-up Sorted Table Search Procedures:
Methodology and Practical Guidelines [0.0]
We study to what extend Machine Learning Techniques can contribute to obtain such a speed-up.
We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing.
Indeed, we formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
arXiv Detail & Related papers (2020-07-20T16:26:54Z) - Hands-off Model Integration in Spatial Index Structures [8.710716183434918]
In this paper we explore the opportunity to use light-weight machine learning models to accelerate queries on spatial indexes.
We do so by exploring the potential of using and similar techniques on the R-tree, arguably the most broadly used spatial index.
As we show in our analysis, the query execution time can be reduced by up to 60% while simultaneously shrinking the index's memory footprint by over 90%.
arXiv Detail & Related papers (2020-06-29T22:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.