Related papers: NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance

NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance

URL: http://arxiv.org/abs/2506.23397v1
Date: Sun, 29 Jun 2025 21:16:07 GMT
Title: NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance
Authors: Gaurav Sehgal, Semih Salihoglu,
Abstract summary: We present NaviX, a native vector index for graphs (GDBMSs)<n> NaviX is built on the Hierarchical Navigable Small-World (HNSW) graph, which itself is a graph-based structure.
Score: 7.108581652658526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is an increasing demand for extending existing DBMSs with vector indices so that they become unified systems capable of supporting modern predictive applications, which require joint querying of vector embeddings together with the structured properties and connections of objects. We present NaviX, a native vector index for graph DBMSs (GDBMSs) that has two main design goals. First, we aim to implement a disk-based vector index that leverages the core storage and query-processing capabilities of the underlying GDBMS. To this end, NaviX is built on the Hierarchical Navigable Small-World (HNSW) graph, which itself is a graph-based structure. Second, we aim to support predicate-agnostic filtered vector search queries, in which the k nearest neighbors (kNNs) of a query vector vQ are searched only within an arbitrary subset S of vectors defined by an ad-hoc selection sub-query QS. We adopt a prefiltering approach that evaluates QS first and passes the full description of subset S to the kNN search operator. We study how to design a prefiltering search algorithm that remains robust under varying selectivities and under different correlations between subset S and query vector vQ. We propose an adaptive algorithm that uses the local selectivity of each vector in the HNSW graph to choose an appropriate heuristic at every iteration of the kNN search. Finally, We demonstrate NaviX's robustness and efficiency through extensive experiments against both existing prefiltering- and postfiltering-based baselines.

Related papers

Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search [2.377892000761193]
We introduce a new retrieval paradigm: semantic compression, which aims to select a compact, representative set of vectors that captures the broader semantic structure around a query.<n>To operationalize this idea, we propose graph-augmented vector retrieval, which overlays semantic graphs (e.g., kNN or knowledge-based links) atop vector spaces.<n>Our work outlines a foundation for meaning-centric vector search systems, emphasizing hybrid indexing, diversity-aware querying, and structured semantic retrieval.
arXiv Detail & Related papers (2025-07-25T23:35:11Z)
HAKES: Scalable Vector Database for Embedding Search Service [16.034584281180006]
We build a vector database that achieves high throughput and high recall under concurrent read-write workloads.<n>Our index outperforms index baselines in the high recall region and under concurrent read-write workloads.<n>namesys is scalable and achieves up to $16times$ higher throughputs than the baselines.
arXiv Detail & Related papers (2025-05-18T19:26:29Z)
An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline [0.6445605125467574]
Retrieval Augmented Generation (RAG) systems enhance response quality by integrating Large Language Models (LLMs) with vector databases.<n>Existing optimizations for vector search and LLM serving have largely been developed in isolation.<n>This paper introduces VectorLiteRAG, an optimized vector index partitioning mechanism designed for RAG systems.
arXiv Detail & Related papers (2025-04-11T19:18:41Z)
GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction [1.1599570446840546]
Cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. It is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. We present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search.
arXiv Detail & Related papers (2024-10-14T21:14:27Z)
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? [62.57689536630933]
We provide experimental results on the BEIR dataset using the open-source Lucene search library. Our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers.
arXiv Detail & Related papers (2024-09-10T12:46:23Z)
Efficient Data Access Paths for Mixed Vector-Relational Search [8.80592433569832]
Machine learning and the adoption of data processing methods using vector embeddings sparked a great interest in creating systems for vector data management. While the predominant approach of vector data management is to use specialized index structures for fast search over the entirety of the vector embeddings, once combined with other (meta)data, the search queries can also become selective on relational attributes. As using vector indexes differs from traditional relational data access, we revisit and analyze alternative access paths for efficient mixed vector-relational search.
arXiv Detail & Related papers (2024-03-23T11:34:17Z)
Vector search with small radiuses [10.880913075221361]
This paper focuses on the common case where a hard decision needs to be taken depending on the vector retrieval results. We show that the value of a range search result can be modeled rigorously based on the query-to-vector distance. This yields a metric for range search, RSM, that is both principled and easy to compute without running an end-to-end evaluation.
arXiv Detail & Related papers (2024-03-16T00:34:25Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms. The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z)
Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G. matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z)
Learning Query Expansion over the Nearest Neighbor Graph [94.80212602202518]
Graph Query Expansion (GQE) is presented, which is learned in a supervised manner and performs aggregation over an extended neighborhood of the query. The technique achieves state-of-the-art results over known benchmarks.
arXiv Detail & Related papers (2021-12-05T19:48:42Z)
The Case for Learned Spatial Indexes [62.88514422115702]
We use techniques proposed from a state-of-the art learned multi-dimensional index structure (namely, Flood) to answer spatial range queries. We show that (i) machine learned search within a partition is faster by 11.79% to 39.51% than binary search when using filtering on one dimension. We also refine using machine learned indexes is 1.23x to 1.83x times faster than closest competitor which filters on two dimensions.
arXiv Detail & Related papers (2020-08-24T12:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.