Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering
- URL: http://arxiv.org/abs/2501.11301v2
- Date: Fri, 07 Feb 2025 06:34:14 GMT
- Title: Question-to-Question Retrieval for Hallucination-Free Knowledge Access: An Approach for Wikipedia and Wikidata Question Answering
- Authors: Santhosh Thottingal,
- Abstract summary: This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata.
We generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM.
We demonstrate its effectiveness on Wikipedia and Wikidata, including multimedia content through structured fact retrieval from Wikidata.
- Score: 0.0
- License:
- Abstract: This paper introduces an approach to question answering over knowledge bases like Wikipedia and Wikidata by performing "question-to-question" matching and retrieval from a dense vector embedding store. Instead of embedding document content, we generate a comprehensive set of questions for each logical content unit using an instruction-tuned LLM. These questions are vector-embedded and stored, mapping to the corresponding content. Vector embedding of user queries are then matched against this question vector store. The highest similarity score leads to direct retrieval of the associated article content, eliminating the need for answer generation. Our method achieves high cosine similarity ( > 0.9 ) for relevant question pairs, enabling highly precise retrieval. This approach offers several advantages including computational efficiency, rapid response times, and increased scalability. We demonstrate its effectiveness on Wikipedia and Wikidata, including multimedia content through structured fact retrieval from Wikidata, opening up new pathways for multimodal question answering.
Related papers
- Integrating SPARQL and LLMs for Question Answering over Scholarly Data Sources [0.0]
This paper describes a methodology that combines SPARQL queries, divide and conquer algorithms, and a pre-trained extractive question answering model.
It starts with SPARQL queries to gather data, then applies divide and conquer to manage various question types and sources, and uses the model to handle personal author questions.
The approach, evaluated with Exact Match and F-score metrics, shows promise for improving QA accuracy and efficiency in scholarly contexts.
arXiv Detail & Related papers (2024-09-11T14:50:28Z) - HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs [9.559336828884808]
Large Language Models (LLMs) are adept at answering simple (single-hop) questions.
As the complexity of the questions increase, the performance of LLMs degrades.
Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text.
We propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information.
arXiv Detail & Related papers (2024-06-10T05:22:49Z) - Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot
Sequence-to-Sequence Semantic Parsing over Wikidata [6.716263690738313]
This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata.
It consists of real-world data with SPARQL.
We modify SPARQL to use the unique domain and property names instead of their IDs.
arXiv Detail & Related papers (2023-05-23T16:20:43Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Enhanced vectors for top-k document retrieval in Question Answering [0.0]
We propose a different approach that retrieves the evidence documents efficiently and accurately.
We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors.
This approach enables efficient creation of real-time query vectors in 4 milliseconds.
arXiv Detail & Related papers (2022-10-08T07:44:24Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - A Benchmark for Generalizable and Interpretable Temporal Question
Answering over Knowledge Bases [67.33560134350427]
TempQA-WD is a benchmark dataset for temporal reasoning.
It is based on Wikidata, which is the most frequently curated, openly available knowledge base.
arXiv Detail & Related papers (2022-01-15T08:49:09Z) - HopRetriever: Retrieve Hops over Wikipedia to Answer Complex Questions [38.89150764309989]
We build HopRetriever which retrieves hops over Wikipedia to answer complex questions.
Our approach also yields quantifiable interpretations of the evidence collection process.
arXiv Detail & Related papers (2020-12-31T10:36:01Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions.
Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers.
Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z) - Open-Domain Question Answering with Pre-Constructed Question Spaces [70.13619499853756]
Open-domain question answering aims at solving the task of locating the answers to user-generated questions in massive collections of documents.
There are two families of solutions available: retriever-readers, and knowledge-graph-based approaches.
We propose a novel algorithm with a reader-retriever structure that differs from both families.
arXiv Detail & Related papers (2020-06-02T04:31:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.