Related papers: A Survey of Learned Indexes for the Multi-dimensional Space

A Survey of Learned Indexes for the Multi-dimensional Space

URL: http://arxiv.org/abs/2403.06456v1
Date: Mon, 11 Mar 2024 06:32:32 GMT
Title: A Survey of Learned Indexes for the Multi-dimensional Space
Authors: Abdullah Al-Mamun, Hao Wu, Qiyang He, Jianguo Wang, Walid G. Aref
Abstract summary: This survey focuses on learned multi-dimensional index structures. We present a taxonomy that classifies and categorizes each learned multi-dimensional index. We highlight several open challenges and future research directions in this emerging and highly active field.
Score: 7.574538354949901
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the mapping from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The concept of one-dimensional learned indexes has naturally been extended to multi-dimensional (e.g., spatial) data, leading to the development of "Learned Multi-dimensional Indexes". This survey focuses on learned multi-dimensional index structures. Specifically, it reviews the current state of this research area, explains the core concepts behind each proposed method, and classifies these methods based on several well-defined criteria. We present a taxonomy that classifies and categorizes each learned multi-dimensional index, and survey the existing literature on learned multi-dimensional indexes according to this taxonomy. Additionally, we present a timeline to illustrate the evolution of research on learned indexes. Finally, we highlight several open challenges and future research directions in this emerging and highly active field.

Related papers

Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System [46.70111672855811]
MultiFaceted Learnable Index (MFLI) is a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework.<n>MFLI improves recall on engagement tasks by up to 11.8%, cold-content delivery by up to 57.29%, and semantic relevance by 13.5% compared with prior state-of-the-art methods.
arXiv Detail & Related papers (2026-02-18T01:31:29Z)
Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search. It organizes key concepts from papers as a semantic index guided by an academic taxonomy. It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems. In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z)
An Assessment Tool for Academic Research Managers in the Third World [125.99533416395765]
We show how the data in one of the bases can be used to infer the main index of the other one. Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications.
arXiv Detail & Related papers (2022-09-07T14:59:25Z)
A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes. Machine learning models are developed to approximate the position of each data record on the disk. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z)
Learned Sorted Table Search and Static Indexes in Small Space: Methodological and Practical Insights via an Experimental Study [0.0]
Sorted Table Search Procedures are the quintessential query-answering tool, still very useful. Speeding them up, in small additional space with respect to the table being searched into, is still a quite significant achievement. To what extent one can enjoy the speed-up of Learned while using constant or nearly constant additional space is a major question.
arXiv Detail & Related papers (2021-07-19T16:06:55Z)
A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems. Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution. We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z)
The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries. We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension. We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.