The Case for Learned Spatial Indexes
- URL: http://arxiv.org/abs/2008.10349v1
- Date: Mon, 24 Aug 2020 12:09:55 GMT
- Title: The Case for Learned Spatial Indexes
- Authors: Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin
Ding, Alfons Kemper
- Abstract summary: We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
- Score: 62.88514422115702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial data is ubiquitous. Massive amounts of data are generated every day
from billions of GPS-enabled devices such as cell phones, cars, sensors, and
various consumer-based applications such as Uber, Tinder, location-tagged posts
in Facebook, Twitter, Instagram, etc. This exponential growth in spatial data
has led the research community to focus on building systems and applications
that can process spatial data efficiently. In the meantime, recent research has
introduced learned index structures. In this work, we use techniques proposed
from a state-of-the art learned multi-dimensional index structure (namely,
Flood) and apply them to five classical multi-dimensional indexes to be able to
answer spatial range queries. By tuning each partitioning technique for optimal
performance, we show that (i) machine learned search within a partition is
faster by 11.79\% to 39.51\% than binary search when using filtering on one
dimension, (ii) the bottleneck for tree structures is index lookup, which could
potentially be improved by linearizing the indexed partitions (iii) filtering
on one dimension and refining using machine learned indexes is 1.23x to 1.83x
times faster than closest competitor which filters on two dimensions, and (iv)
learned indexes can have a significant impact on the performance of low
selectivity queries while being less effective under higher selectivities.
Related papers
- Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search [24.85572470526277]
Range-filtering approximate nearest neighbor (RFANN) search is attracting increasing attention in academia and industry.
Recent study proposes to build $O(n2)$ dedicated graph-based indexes for all possible query ranges.
We materialize graph-based indexes, called elemental graphs, for a moderate number of ranges.
arXiv Detail & Related papers (2024-09-04T09:41:52Z) - Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - Towards Improving the Consistency, Efficiency, and Flexibility of
Differentiable Neural Architecture Search [84.4140192638394]
Most differentiable neural architecture search methods construct a super-net for search and derive a target-net as its sub-graph for evaluation.
In this paper, we introduce EnTranNAS that is composed of Engine-cells and Transit-cells.
Our method also spares much memory and computation cost, which speeds up the search process.
arXiv Detail & Related papers (2021-01-27T12:16:47Z) - A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems.
Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution.
We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z) - Hands-off Model Integration in Spatial Index Structures [8.710716183434918]
In this paper we explore the opportunity to use light-weight machine learning models to accelerate queries on spatial indexes.
We do so by exploring the potential of using and similar techniques on the R-tree, arguably the most broadly used spatial index.
As we show in our analysis, the query execution time can be reduced by up to 60% while simultaneously shrinking the index's memory footprint by over 90%.
arXiv Detail & Related papers (2020-06-29T22:05:28Z) - Tsunami: A Learned Multi-dimensional Index for Correlated Data and
Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.