Hands-off Model Integration in Spatial Index Structures
- URL: http://arxiv.org/abs/2006.16411v2
- Date: Sun, 9 Aug 2020 19:43:38 GMT
- Title: Hands-off Model Integration in Spatial Index Structures
- Authors: Ali Hadian, Ankit Kumar, Thomas Heinis
- Abstract summary: In this paper we explore the opportunity to use light-weight machine learning models to accelerate queries on spatial indexes.
We do so by exploring the potential of using and similar techniques on the R-tree, arguably the most broadly used spatial index.
As we show in our analysis, the query execution time can be reduced by up to 60% while simultaneously shrinking the index's memory footprint by over 90%.
- Score: 8.710716183434918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial indexes are crucial for the analysis of the increasing amounts of
spatial data, for example generated through IoT applications. The plethora of
indexes that has been developed in recent decades has primarily been optimised
for disk. With increasing amounts of memory even on commodity machines,
however, moving them to main memory is an option. Doing so opens up the
opportunity to use additional optimizations that are only amenable to main
memory. In this paper we thus explore the opportunity to use light-weight
machine learning models to accelerate queries on spatial indexes. We do so by
exploring the potential of using interpolation and similar techniques on the
R-tree, arguably the most broadly used spatial index. As we show in our
experimental analysis, the query execution time can be reduced by up to 60%
while simultaneously shrinking the index's memory footprint by over 90%
Related papers
- Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory [14.432536669959218]
Vector searches on large-scale datasets are critical to modern online services like web search and RAG.
We characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes.
We show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification on a variety of second-tier memory devices.
arXiv Detail & Related papers (2024-05-06T08:38:14Z) - Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval [1.099532646524593]
DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage.
Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets.
We propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads the compressed vectors to storage.
arXiv Detail & Related papers (2024-04-09T04:20:27Z) - End-to-End Learning to Index and Search in Large Output Spaces [95.16066833532396]
Extreme multi-label classification (XMC) is a popular framework for solving real-world problems.
In this paper, we propose a novel method which relaxes the tree-based index to a specialized weighted graph-based index.
ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels.
arXiv Detail & Related papers (2022-10-16T01:34:17Z) - Learned Sorted Table Search and Static Indexes in Small Space:
Methodological and Practical Insights via an Experimental Study [0.0]
Sorted Table Search Procedures are the quintessential query-answering tool, still very useful.
Speeding them up, in small additional space with respect to the table being searched into, is still a quite significant achievement.
To what extent one can enjoy the speed-up of Learned while using constant or nearly constant additional space is a major question.
arXiv Detail & Related papers (2021-07-19T16:06:55Z) - IRLI: Iterative Re-partitioning for Learning to Index [104.72641345738425]
Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings.
We propose a novel approach called IRLI, which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data.
We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing.
arXiv Detail & Related papers (2021-03-17T23:13:25Z) - A Pluggable Learned Index Method via Sampling and Gap Insertion [48.900186573181735]
Database indexes facilitate data retrieval and benefit broad applications in real-world systems.
Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution.
We study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes.
arXiv Detail & Related papers (2021-01-04T07:17:23Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z) - Tsunami: A Learned Multi-dimensional Index for Correlated Data and
Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.