Enhanced vectors for top-k document retrieval in Question Answering
- URL: http://arxiv.org/abs/2210.10584v1
- Date: Sat, 8 Oct 2022 07:44:24 GMT
- Title: Enhanced vectors for top-k document retrieval in Question Answering
- Authors: Mohammed Hammad
- Abstract summary: We propose a different approach that retrieves the evidence documents efficiently and accurately.
We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors.
This approach enables efficient creation of real-time query vectors in 4 milliseconds.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern day applications, especially information retrieval webapps that
involve "search" as their use cases are gradually moving towards "answering"
modules. Conversational chatbots which have been proved to be more engaging to
users, use Question Answering as their core. Since, precise answering is
computationally expensive, several approaches have been developed to prefetch
the most relevant documents/passages from the database that contain the answer.
We propose a different approach that retrieves the evidence documents
efficiently and accurately, making sure that the relevant document for a given
user query is not missed. We do so by assigning each document (or passage in
our case), a unique identifier and using them to create dense vectors which can
be efficiently indexed. More precisely, we use the identifier to predict
randomly sampled context window words of the relevant question corresponding to
the passage along with the words of passage itself. This naturally embeds the
passage identifier into the vector space in such a way that the embedding is
closer to the question without compromising he information content. This
approach enables efficient creation of real-time query vectors in ~4
milliseconds.
Related papers
- Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering [0.0]
This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata.
We generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM.
We demonstrate its effectiveness on Wikipedia and Wikidata, including multimedia content through structured fact retrieval from Wikidata.
arXiv Detail & Related papers (2025-01-20T07:05:15Z) - Optimization of Retrieval-Augmented Generation Context with Outlier Detection [0.0]
We focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems.
Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers.
It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
arXiv Detail & Related papers (2024-07-01T15:53:29Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information.
We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions.
Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers.
Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z) - Open-Domain Question Answering with Pre-Constructed Question Spaces [70.13619499853756]
Open-domain question answering aims at solving the task of locating the answers to user-generated questions in massive collections of documents.
There are two families of solutions available: retriever-readers, and knowledge-graph-based approaches.
We propose a novel algorithm with a reader-retriever structure that differs from both families.
arXiv Detail & Related papers (2020-06-02T04:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.