A Query-Driven Approach to Space-Efficient Range Searching
- URL: http://arxiv.org/abs/2502.13653v1
- Date: Wed, 19 Feb 2025 12:01:00 GMT
- Title: A Query-Driven Approach to Space-Efficient Range Searching
- Authors: Dimitris Fotakis, Andreas Kalavas, Ioannis Psarros,
- Abstract summary: We show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying.
We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times.
Our algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation.
- Score: 12.760453906939446
- License:
- Abstract: We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time.
Related papers
- ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.
Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.
Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library.
Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search [9.01394829787271]
This work bridges the gap by studying routing in clustering-based maximum inner product search.
We present a framework that incorporates the moments of the distribution of inner products within each shard to estimate the maximum inner product.
arXiv Detail & Related papers (2024-05-20T17:47:18Z) - Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection [2.3814052021083354]
This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem.
Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search.
We show that, using softmax-based features, our method achieves a more than ten-fold speed-up over exhaustive search with no loss of accuracy.
arXiv Detail & Related papers (2023-11-05T06:12:03Z) - Towards Personalized Preprocessing Pipeline Search [52.59156206880384]
ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering.
We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines.
Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
arXiv Detail & Related papers (2023-02-28T05:45:05Z) - Tree Learning: Optimal Algorithms and Sample Complexity [10.638365461509]
We study the problem of learning a hierarchical tree representation of data from labeled samples taken from an arbitrary distribution.
We present optimal sample bounds for this problem in several learning settings, including (agnostic) learning and online learning.
arXiv Detail & Related papers (2023-02-09T08:35:17Z) - Hypotheses Tree Building for One-Shot Temporal Sentence Localization [53.82714065005299]
One-shot temporal sentence localization (one-shot TSL) learns to retrieve the query information among the entire video with only one annotated frame.
We propose an effective and novel tree-structure baseline for one-shot TSL, called Multiple Hypotheses Segment Tree (MHST)
MHST captures the query-aware discriminative frame-wise information under the insufficient annotations.
arXiv Detail & Related papers (2023-01-05T01:50:43Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Improving Document Representations by Generating Pseudo Query Embeddings
for Dense Retrieval [11.465218502487959]
We design a method to mimic the queries on each of the documents by an iterative clustering process.
We also optimize the matching function with a two-step score calculation procedure.
Experimental results on several popular ranking and QA datasets show that our model can achieve state-of-the-art results.
arXiv Detail & Related papers (2021-05-08T05:28:24Z) - A Practical Index Structure Supporting Fr\'echet Proximity Queries Among
Trajectories [1.9335262420787858]
We present a scalable approach for range and $k$ nearest neighbor queries under computationally expensive metrics.
Based on clustering for metric indexes, we obtain a dynamic tree structure whose size is linear in the number of trajectories.
We analyze the efficiency and effectiveness of our methods with extensive experiments on diverse synthetic and real-world data sets.
arXiv Detail & Related papers (2020-05-28T04:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.