Micro-architectural Analysis of a Learned Index
- URL: http://arxiv.org/abs/2109.08495v1
- Date: Fri, 17 Sep 2021 12:13:06 GMT
- Title: Micro-architectural Analysis of a Learned Index
- Authors: Mikkel M{\o}ller Andersen, P{\i}nar T\"oz\"un
- Abstract summary: ALEX is a tree-based in-memory index structure that consists of a hierarchy of machine learned models.
Our results show that ALEX exhibits fewer stalls and a lower cycles-per-instruction value across different workloads.
On the other hand, the amount of instructions required to handle out-of-bound inserts in ALEX can increase the instructions needed per request significantly (10X)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since the publication of The Case for Learned Index Structures in 2018, there
has been a rise in research that focuses on learned indexes for different
domains and with different functionalities. While the effectiveness of learned
indexes as an alternative to traditional index structures such as B+Trees have
already been demonstrated by several studies, previous work tend to focus on
higher-level performance metrics such as throughput and index size. In this
paper, our goal is to dig deeper and investigate how learned indexes behave at
a micro-architectural level compared to traditional indexes.
More specifically, we focus on previously proposed learned index structure
ALEX, which is a tree-based in-memory index structure that consists of a
hierarchy of machine learned models. Unlike the original proposal for learned
indexes, ALEX is designed from the ground up to allow updates and inserts.
Therefore, it enables more dynamic workloads using learned indexes. In this
work, we perform a micro-architectural analysis of ALEX and compare its
behavior to the tree-based index structures that are not based on learned
models, i.e., ART and B+Tree.
Our results show that ALEX is bound by memory stalls, mainly stalls due to
data misses from the last-level cache. Compared to ART and B+Tree, ALEX
exhibits fewer stalls and a lower cycles-per-instruction value across different
workloads. On the other hand, the amount of instructions required to handle
out-of-bound inserts in ALEX can increase the instructions needed per request
significantly (10X) for write-heavy workloads. However, the micro-architectural
behavior shows that this increase in the instruction footprint exhibit high
instruction-level parallelism, and, therefore, does not negatively impact the
overall execution time.
Related papers
- Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval [18.15717995719973]
End-to-end Hierarchical Indexing (EHI) is a novel method for embedding-based retrieval.
EHI outperforms existing state-of-the-art methods by +1.45% in MRR@10 on MS MARCO (Dev) and +8.2% in nDCG@10 on TREC DL19.
arXiv Detail & Related papers (2023-10-13T06:53:02Z) - End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems.
In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index.
ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z) - LSI: A Learned Secondary Index Structure [24.324528705706104]
We introduce Learned Secondary Index (LSI), a first attempt to use learned indexes for indexing unsorted data.
We show that LSI achieves comparable lookup performance to state-of-the-art secondary indexes while being up to 6x more space efficient.
arXiv Detail & Related papers (2022-05-11T20:49:44Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - CARMI: A Cache-Aware Learned Index with a Cost-based Construction
Algorithm [1.9798034349981157]
We propose a cache-aware learned index (CARMI) design to improve the efficiency of the Recursive Model Index (RMI) framework.
We formulate the problem of finding the optimal design of a learned index as an optimization problem and propose a dynamic programming algorithm for solving it.
Experiments show that our index construction strategy can construct indexes with significantly better performance compared to baselines.
arXiv Detail & Related papers (2021-03-01T09:20:53Z) - A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems.
Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution.
We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - COAX: Correlation-Aware Indexing on Multidimensional Data with Soft
Functional Dependencies [3.670422696827525]
We present COAX, a learned index for multidimensional data that learns the correlations between attributes of the dataset.
We show experimentally that by predicting correlated attributes in the data, we can improve the query execution time and reduce the memory overhead of the index.
arXiv Detail & Related papers (2020-06-29T21:22:15Z) - Tsunami: A Learned Multi-dimensional Index for Correlated Data and
Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z) - RadixSpline: A Single-Pass Learned Index [84.84747738666263]
We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data.
RS achieves competitive results on all datasets, despite the fact that it only has two parameters.
arXiv Detail & Related papers (2020-04-30T01:56:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.