Recursive Abstractive Processing for Retrieval in Dynamic Datasets
- URL: http://arxiv.org/abs/2410.01736v1
- Date: Wed, 2 Oct 2024 16:47:35 GMT
- Title: Recursive Abstractive Processing for Retrieval in Dynamic Datasets
- Authors: Charbel Chucri, Rami Azouz, Joachim Ott,
- Abstract summary: We propose a new algorithm to efficiently maintain the recursive-abstractive tree structure in dynamic datasets.
We also introduce a novel post-retrieval method that applies query-focused abstractive processing to substantially improve context quality.
Our method overcomes the limitations of other approaches by functioning as a black-box post-retrieval layer compatible with any retrieval algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both the original text and generated summaries. However, such approaches face limitations with dynamic datasets, where adding or removing documents over time complicates the updating of hierarchical representations formed through clustering. We propose a new algorithm to efficiently maintain the recursive-abstractive tree structure in dynamic datasets, without compromising performance. Additionally, we introduce a novel post-retrieval method that applies query-focused recursive abstractive processing to substantially improve context quality. Our method overcomes the limitations of other approaches by functioning as a black-box post-retrieval layer compatible with any retrieval algorithm. Both algorithms are validated through extensive experiments on real-world datasets, demonstrating their effectiveness in handling dynamic data and improving retrieval performance.
Related papers
- HIRO: Hierarchical Information Retrieval Optimization [0.0]
Retrieval-Augmented Generation (RAG) has revolutionized natural language processing by dynamically integrating external knowledge into Large Language Models (LLMs)
Recent implementations of RAG leverage hierarchical data structures, which organize documents at various levels of summarization and information density.
This complexity can cause LLMs to "choke" on information overload, necessitating more sophisticated querying mechanisms.
arXiv Detail & Related papers (2024-06-14T12:41:07Z) - Contextual Categorization Enhancement through LLMs Latent-Space [0.31263095816232184]
We propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset.
We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories.
arXiv Detail & Related papers (2024-04-25T09:20:51Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval [26.527911244587134]
We introduce the novel approach of embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up.
At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction.
arXiv Detail & Related papers (2024-01-31T18:30:21Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts.
Recent data augmentation methods often neglect the problem of grammatical incorrectness.
We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z) - Document-Level Abstractive Summarization [0.0]
We study how efficient Transformer techniques can be used to improve the automatic summarization of very long texts.
We propose a novel retrieval-enhanced approach which reduces the cost of generating a summary of the entire document by processing smaller chunks.
arXiv Detail & Related papers (2022-12-06T14:39:09Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.