Automatic Question-Answer Generation for Long-Tail Knowledge
- URL: http://arxiv.org/abs/2403.01382v1
- Date: Sun, 3 Mar 2024 03:06:31 GMT
- Title: Automatic Question-Answer Generation for Long-Tail Knowledge
- Authors: Rohan Kumar, Youngmin Kim, Sunitha Ravi, Haitian Sun, Christos
Faloutsos, Ruslan Salakhutdinov, Minji Yoon
- Abstract summary: We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
- Score: 65.11554185687258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained Large Language Models (LLMs) have gained significant attention for
addressing open-domain Question Answering (QA). While they exhibit high
accuracy in answering questions related to common knowledge, LLMs encounter
difficulties in learning about uncommon long-tail knowledge (tail entities).
Since manually constructing QA datasets demands substantial human resources,
the types of existing QA datasets are limited, leaving us with a scarcity of
datasets to study the performance of LLMs on tail entities. In this paper, we
propose an automatic approach to generate specialized QA datasets for tail
entities and present the associated research challenges. We conduct extensive
experiments by employing pretrained LLMs on our newly generated long-tail QA
datasets, comparing their performance with and without external resources
including Wikipedia and Wikidata knowledge graphs.
Related papers
- DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning [0.9295048974480845]
We propose CuriousLLM, an enhancement that integrates a curiosity-driven reasoning mechanism into an LLM agent.
This mechanism enables the agent to generate relevant follow-up questions, thereby guiding the information retrieval process more efficiently.
Our experiments show that CuriousLLM significantly boosts LLM performance in multi-document question answering (MD-QA)
arXiv Detail & Related papers (2024-04-13T20:43:46Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Question Answering Survey: Directions, Challenges, Datasets, Evaluation
Matrices [0.0]
The research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach.
This detailed followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language.
arXiv Detail & Related papers (2021-12-07T08:53:40Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.