Enhanced vectors for top-k document retrieval in Question Answering
- URL: http://arxiv.org/abs/2210.10584v1
- Date: Sat, 8 Oct 2022 07:44:24 GMT
- Title: Enhanced vectors for top-k document retrieval in Question Answering
- Authors: Mohammed Hammad
- Abstract summary: We propose a different approach that retrieves the evidence documents efficiently and accurately.
We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors.
This approach enables efficient creation of real-time query vectors in 4 milliseconds.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern day applications, especially information retrieval webapps that
involve "search" as their use cases are gradually moving towards "answering"
modules. Conversational chatbots which have been proved to be more engaging to
users, use Question Answering as their core. Since, precise answering is
computationally expensive, several approaches have been developed to prefetch
the most relevant documents/passages from the database that contain the answer.
We propose a different approach that retrieves the evidence documents
efficiently and accurately, making sure that the relevant document for a given
user query is not missed. We do so by assigning each document (or passage in
our case), a unique identifier and using them to create dense vectors which can
be efficiently indexed. More precisely, we use the identifier to predict
randomly sampled context window words of the relevant question corresponding to
the passage along with the words of passage itself. This naturally embeds the
passage identifier into the vector space in such a way that the embedding is
closer to the question without compromising he information content. This
approach enables efficient creation of real-time query vectors in ~4
milliseconds.
Related papers
- Improving RAG Retrieval via Propositional Content Extraction: a Speech Act Theory Approach [0.0]
This paper investigates whether extracting the underlying propositional content from user utterances can improve retrieval quality in Retrieval-Augmented Generation systems.
We propose a practical method for automatically transforming queries into their propositional equivalents before embedding.
arXiv Detail & Related papers (2025-03-07T20:15:40Z) - Exploring Rewriting Approaches for Different Conversational Tasks [63.56404271441824]
The exact rewriting approach may often depend on the use case and application-specific tasks supported by the conversational assistant.
We systematically investigate two different approaches, denoted as rewriting and fusion, on two fundamentally different generation tasks.
Our results indicate that the specific rewriting or fusion approach highly depends on the underlying use case and generative task.
arXiv Detail & Related papers (2025-02-26T06:05:29Z) - Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering [0.0]
This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata.
We generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM.
We demonstrate its effectiveness on Wikipedia and Wikidata, including multimedia content through structured fact retrieval from Wikidata.
arXiv Detail & Related papers (2025-01-20T07:05:15Z) - Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems.
Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers.
It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information.
We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Asking questions on handwritten document collections [35.85762649504866]
This work addresses the problem of Question Answering (QA) on handwritten document collections.
Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies.
We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult.
arXiv Detail & Related papers (2021-10-02T02:40:40Z) - Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions.
Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers.
Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z) - Open-Domain Question Answering with Pre-Constructed Question Spaces [70.13619499853756]
Open-domain question answering aims at solving the task of locating the answers to user-generated questions in massive collections of documents.
There are two families of solutions available: retriever-readers, and knowledge-graph-based approaches.
We propose a novel algorithm with a reader-retriever structure that differs from both families.
arXiv Detail & Related papers (2020-06-02T04:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.