Related papers: Annotative Indexing

Annotative Indexing

URL: http://arxiv.org/abs/2411.06256v2
Date: Wed, 20 Nov 2024 15:28:42 GMT
Title: Annotative Indexing
Authors: Charles L. A. Clarke,
Abstract summary: Annotative indexing is a novel framework that unifies and generalizes traditional inverted indexes, column stores, object stores, and graph databases. Annotative indexing can provide the underlying indexing framework for databases that support knowledge graphs, entity, semi-structured data, and ranked.
Score: 8.684302613224338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces annotative indexing, a novel framework that unifies and generalizes traditional inverted indexes, column stores, object stores, and graph databases. As a result, annotative indexing can provide the underlying indexing framework for databases that support knowledge graphs, entity retrieval, semi-structured data, and ranked retrieval. While we primarily focus on human language data in the form of text, annotative indexing is sufficiently general to support a range of other datatypes, and we provide examples of SQL-like queries over a JSON store that includes numbers and dates. Taking advantage of the flexibility of annotative indexing, we also demonstrate a fully dynamic annotative index incorporating support for ACID properties of transactions with hundreds of multiple concurrent readers and writers.

Related papers

Bridging Queries and Tables through Entities in Table Retrieval [70.13748256886288]
Entities are well-studied in the context of text retrieval, but there is a noticeable lack of research on their applications in table retrieval. We propose an entity-enhanced training framework and design an interaction paradigm based on entity representations. Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retriever training processes.
arXiv Detail & Related papers (2025-04-09T03:16:33Z)
Differentially Private Learned Indexes [4.290415158471898]
We address the problem of efficiently answering predicate queries on encrypted databases, those secured by Trusted Execution Environments (TEEs) A common strategy in modern databases to accelerate predicate queries is the use of indexes, which map attribute values (keys) to their corresponding positions in a sorted data array. Unfortunately, indexes cannot be directly applied to encrypted databases due to strong data dependent leakages. We propose leveraging learned indexes, a trending technique that repurposes machine learning models as indexing structures, to build more compact DP indexes.
arXiv Detail & Related papers (2024-10-28T16:04:58Z)
Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu) DAQu augments the original query with various (query-related) metadata across multiple tables. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z)
Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework. It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
How to Index Item IDs for Recommendation Foundation Models [49.425959632372425]
Recommendation foundation model utilizes large language models (LLM) for recommendation by converting recommendation tasks into natural language tasks. To avoid generating excessively long text and hallucinated recommendations, creating LLM-compatible item IDs is essential. We propose four simple yet effective solutions, including sequential indexing, collaborative indexing, semantic (content-based) indexing, and hybrid indexing.
arXiv Detail & Related papers (2023-05-11T05:02:37Z)
Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation [98.02743096197402]
Differentiable Search Index (DSI) is an emerging paradigm for information retrieval. We propose a simple yet effective indexing framework for DSI, called DSI-QG. Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.
arXiv Detail & Related papers (2022-06-21T06:21:23Z)
A Learned Index for Exact Similarity Search in Metric Spaces [25.330353637669386]
LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes. Machine learning models are developed to approximate the position of each data record on the disk. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes.
arXiv Detail & Related papers (2022-04-21T11:24:55Z)
Learned Indexes for a Google-scale Disk-based Database [23.93643265060042]
We show how a learned index can be integrated in a distributed, disk-based database system: Google's Bigtable. Our results show that integrating learned index significantly improves the end-to-end read latency and throughput for Bigtable.
arXiv Detail & Related papers (2020-12-23T05:56:45Z)
A Graph Representation of Semi-structured Data for Web Question Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads [29.223401893397714]
We introduce Tsunami, which achieves up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes.
arXiv Detail & Related papers (2020-06-23T19:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.