Automatic Question-Answer Generation for Long-Tail Knowledge
- URL: http://arxiv.org/abs/2403.01382v1
- Date: Sun, 3 Mar 2024 03:06:31 GMT
- Title: Automatic Question-Answer Generation for Long-Tail Knowledge
- Authors: Rohan Kumar, Youngmin Kim, Sunitha Ravi, Haitian Sun, Christos
Faloutsos, Ruslan Salakhutdinov, Minji Yoon
- Abstract summary: We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
- Score: 65.11554185687258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained Large Language Models (LLMs) have gained significant attention for
addressing open-domain Question Answering (QA). While they exhibit high
accuracy in answering questions related to common knowledge, LLMs encounter
difficulties in learning about uncommon long-tail knowledge (tail entities).
Since manually constructing QA datasets demands substantial human resources,
the types of existing QA datasets are limited, leaving us with a scarcity of
datasets to study the performance of LLMs on tail entities. In this paper, we
propose an automatic approach to generate specialized QA datasets for tail
entities and present the associated research challenges. We conduct extensive
experiments by employing pretrained LLMs on our newly generated long-tail QA
datasets, comparing their performance with and without external resources
including Wikipedia and Wikidata knowledge graphs.
Related papers
- DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - Large Language Models Struggle to Learn Long-Tail Knowledge [39.01608375863687]
We study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web.
In particular, we show that a language model's ability to answer a fact-based question relates to how many documents associated with that question were seen during pre-training.
arXiv Detail & Related papers (2022-11-15T18:49:27Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Question Answering Survey: Directions, Challenges, Datasets, Evaluation
Matrices [0.0]
The research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach.
This detailed followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language.
arXiv Detail & Related papers (2021-12-07T08:53:40Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.