WISK: A Workload-aware Learned Index for Spatial Keyword Queries
- URL: http://arxiv.org/abs/2302.14287v2
- Date: Fri, 14 Apr 2023 02:56:43 GMT
- Title: WISK: A Workload-aware Learned Index for Spatial Keyword Queries
- Authors: Yufan Sheng, Xin Cao, Yixiang Fang, Kaiqi Zhao, Jianzhong Qi, Gao
Cong, Wenjie Zhang
- Abstract summary: We propose WISK, a learned index for spatial keyword queries.
We show that WISK achieves up to 8x speedup in querying time with comparable storage overhead.
- Score: 46.96314606580924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial objects often come with textual information, such as Points of
Interest (POIs) with their descriptions, which are referred to as geo-textual
data. To retrieve such data, spatial keyword queries that take into account
both spatial proximity and textual relevance have been extensively studied.
Existing indexes designed for spatial keyword queries are mostly built based on
the geo-textual data without considering the distribution of queries already
received. However, previous studies have shown that utilizing the known query
distribution can improve the index structure for future query processing. In
this paper, we propose WISK, a learned index for spatial keyword queries, which
self-adapts for optimizing querying costs given a query workload. One key
challenge is how to utilize both structured spatial attributes and unstructured
textual information during learning the index. We first divide the data objects
into partitions, aiming to minimize the processing costs of the given query
workload. We prove the NP-hardness of the partitioning problem and propose a
machine learning model to find the optimal partitions. Then, to achieve more
pruning power, we build a hierarchical structure based on the generated
partitions in a bottom-up manner with a reinforcement learning-based approach.
We conduct extensive experiments on real-world datasets and query workloads
with various distributions, and the results show that WISK outperforms all
competitors, achieving up to 8x speedup in querying time with comparable
storage overhead.
Related papers
- User Intent Recognition and Semantic Cache Optimization-Based Query Processing Framework using CFLIS and MGR-LAU [0.0]
This work analyzed the informational, navigational, and transactional-based intents in queries for enhanced QP.
For efficient QP, the data is structured using Epanechnikov Kernel-Ordering Points To Identify the Clustering Structure (EK-OPTICS)
The extracted features, detected intents and structured data are inputted to the Multi-head Gated Recurrent Learnable Attention Unit (MGR-LAU)
arXiv Detail & Related papers (2024-06-06T20:28:05Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and K nowledge Bases.
Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [59.359325855708974]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid.
Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes.
Machine learning models are developed to approximate the position of each data record on the disk.
Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries.
We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension.
We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.