WISK: A Workload-aware Learned Index for Spatial Keyword Queries
- URL: http://arxiv.org/abs/2302.14287v2
- Date: Fri, 14 Apr 2023 02:56:43 GMT
- Title: WISK: A Workload-aware Learned Index for Spatial Keyword Queries
- Authors: Yufan Sheng, Xin Cao, Yixiang Fang, Kaiqi Zhao, Jianzhong Qi, Gao
Cong, Wenjie Zhang
- Abstract summary: We propose WISK, a learned index for spatial keyword queries.
We show that WISK achieves up to 8x speedup in querying time with comparable storage overhead.
- Score: 46.96314606580924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial objects often come with textual information, such as Points of
Interest (POIs) with their descriptions, which are referred to as geo-textual
data. To retrieve such data, spatial keyword queries that take into account
both spatial proximity and textual relevance have been extensively studied.
Existing indexes designed for spatial keyword queries are mostly built based on
the geo-textual data without considering the distribution of queries already
received. However, previous studies have shown that utilizing the known query
distribution can improve the index structure for future query processing. In
this paper, we propose WISK, a learned index for spatial keyword queries, which
self-adapts for optimizing querying costs given a query workload. One key
challenge is how to utilize both structured spatial attributes and unstructured
textual information during learning the index. We first divide the data objects
into partitions, aiming to minimize the processing costs of the given query
workload. We prove the NP-hardness of the partitioning problem and propose a
machine learning model to find the optimal partitions. Then, to achieve more
pruning power, we build a hierarchical structure based on the generated
partitions in a bottom-up manner with a reinforcement learning-based approach.
We conduct extensive experiments on real-world datasets and query workloads
with various distributions, and the results show that WISK outperforms all
competitors, achieving up to 8x speedup in querying time with comparable
storage overhead.
Related papers
- Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$.
It allows a novice, English-speaking user to create queries with a small amount of effort.
It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z) - User Intent Recognition and Semantic Cache Optimization-Based Query Processing Framework using CFLIS and MGR-LAU [0.0]
This work analyzed the informational, navigational, and transactional-based intents in queries for enhanced QP.
For efficient QP, the data is structured using Epanechnikov Kernel-Ordering Points To Identify the Clustering Structure (EK-OPTICS)
The extracted features, detected intents and structured data are inputted to the Multi-head Gated Recurrent Learnable Attention Unit (MGR-LAU)
arXiv Detail & Related papers (2024-06-06T20:28:05Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.