A Survey of Learned Indexes for the Multi-dimensional Space
- URL: http://arxiv.org/abs/2403.06456v1
- Date: Mon, 11 Mar 2024 06:32:32 GMT
- Title: A Survey of Learned Indexes for the Multi-dimensional Space
- Authors: Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, Walid G. Aref
- Abstract summary: This survey focuses on learned multi-dimensional index structures.
We present a taxonomy that classifies and categorizes each learned multi-dimensional index.
We highlight several open challenges and future research directions in this emerging and highly active field.
- Score: 7.574538354949901
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A recent research trend involves treating database index structures as
Machine Learning (ML) models. In this domain, single or multiple ML models are
trained to learn the mapping from keys to positions inside a data set. This
class of indexes is known as "Learned Indexes." Learned indexes have
demonstrated improved search performance and reduced space requirements for
one-dimensional data. The concept of one-dimensional learned indexes has
naturally been extended to multi-dimensional (e.g., spatial) data, leading to
the development of "Learned Multi-dimensional Indexes". This survey focuses on
learned multi-dimensional index structures. Specifically, it reviews the
current state of this research area, explains the core concepts behind each
proposed method, and classifies these methods based on several well-defined
criteria. We present a taxonomy that classifies and categorizes each learned
multi-dimensional index, and survey the existing literature on learned
multi-dimensional indexes according to this taxonomy. Additionally, we present
a timeline to illustrate the evolution of research on learned indexes. Finally,
we highlight several open challenges and future research directions in this
emerging and highly active field.
Related papers
- Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search.
It organizes key concepts from papers as a semantic index guided by an academic taxonomy.
It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z) - End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems.
In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index.
ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z) - An Assessment Tool for Academic Research Managers in the Third World [125.99533416395765]
We show how the data in one of the bases can be used to infer the main index of the other one.
Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications.
arXiv Detail & Related papers (2022-09-07T14:59:25Z) - Bridging the Gap Between Indexing and Retrieval for Differentiable
Search Index with Query Generation [98.02743096197402]
Differentiable Search Index (DSI) is an emerging paradigm for information retrieval.
We propose a simple yet effective indexing framework for DSI, called DSI-QG.
Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.
arXiv Detail & Related papers (2022-06-21T06:21:23Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - Learned Sorted Table Search and Static Indexes in Small Space:
Methodological and Practical Insights via an Experimental Study [0.0]
Sorted Table Search Procedures are the quintessential query-answering tool, still very useful.
Speeding them up, in small additional space with respect to the table being searched into, is still a quite significant achievement.
To what extent one can enjoy the speed-up of Learned while using constant or nearly constant additional space is a major question.
arXiv Detail & Related papers (2021-07-19T16:06:55Z) - A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems.
Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution.
We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Tsunami: A Learned Multi-dimensional Index for Correlated Data and
Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.