Navigable Proximity Graph-Driven Native Hybrid Queries with Structured
and Unstructured Constraints
- URL: http://arxiv.org/abs/2203.13601v1
- Date: Fri, 25 Mar 2022 12:02:37 GMT
- Title: Navigable Proximity Graph-Driven Native Hybrid Queries with Structured
and Unstructured Constraints
- Authors: Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue,
Jiongkang Ni
- Abstract summary: We propose a native hybrid query (NHQ) framework based on proximity graph (PG), which provides the specialized textitcomposite index and joint pruning modules for hybrid queries.
We present two novel navigable PGs with optimized edge selection and routing strategies, which obtain better overall performance than existing PGs.
- Score: 10.842138336245384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As research interest surges, vector similarity search is applied in multiple
fields, including data mining, computer vision, and information retrieval.
{Given a set of objects (e.g., a set of images) and a query object, we can
easily transform each object into a feature vector and apply the vector
similarity search to retrieve the most similar objects. However, the original
vector similarity search cannot well support \textit{hybrid queries}, where
users not only input unstructured query constraint (i.e., the feature vector of
query object) but also structured query constraint (i.e., the desired
attributes of interest). Hybrid query processing aims at identifying these
objects with similar feature vectors to query object and satisfying the given
attribute constraints. Recent efforts have attempted to answer a hybrid query
by performing attribute filtering and vector similarity search separately and
then merging the results later, which limits efficiency and accuracy because
they are not purpose-built for hybrid queries.} In this paper, we propose a
native hybrid query (NHQ) framework based on proximity graph (PG), which
provides the specialized \textit{composite index and joint pruning} modules for
hybrid queries. We easily deploy existing various PGs on this framework to
process hybrid queries efficiently. Moreover, we present two novel navigable
PGs (NPGs) with optimized edge selection and routing strategies, which obtain
better overall performance than existing PGs. After that, we deploy the
proposed NPGs in NHQ to form two hybrid query methods, which significantly
outperform the state-of-the-art competitors on all experimental datasets
(10$\times$ faster under the same \textit{Recall}), including eight public and
one in-house real-world datasets. Our code and datasets have been released at
\url{https://github.com/AshenOn3/NHQ}.
Related papers
- DAGE: DAG Query Answering via Relational Combinator with Logical Constraints [24.60431781360608]
We propose a query embedding method for DAG queries called DAGE.
DAGE combines the possibly multiple paths between two nodes into a single path with a trainable operator.
We show that it is possible to implement DAGE on top of existing query embedding methods.
arXiv Detail & Related papers (2024-10-29T15:02:48Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection [2.3814052021083354]
This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem.
Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search.
We show that, using softmax-based features, our method achieves a more than ten-fold speed-up over exhaustive search with no loss of accuracy.
arXiv Detail & Related papers (2023-11-05T06:12:03Z) - High-Throughput Vector Similarity Search in Knowledge Graphs [17.41683819564348]
Recent data management systems propose augmenting query processing with online vector similarity search.
We focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search.
We present our system, HQI, for high- throughput batch processing of hybrid queries.
arXiv Detail & Related papers (2023-04-04T16:19:15Z) - Integrating connection search in graph queries [6.948362325254044]
We show how to integrate connecting tree patterns (CTPs) within a graph query language such as SPARQL or Cypher.
To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning.
arXiv Detail & Related papers (2022-08-09T14:27:57Z) - Query2Particles: Knowledge Graph Reasoning with Particle Embeddings [49.64006979045662]
We propose a query embedding method to answer complex logical queries on knowledge graphs with missing edges.
The answer entities are selected according to the similarities between the entity embeddings and the query embedding.
A complex KG query answering method, Q2P, is proposed to retrieve diverse answers from different areas over the embedding space.
arXiv Detail & Related papers (2022-04-27T11:16:08Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.