Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering
- URL: http://arxiv.org/abs/2305.13691v2
- Date: Mon, 12 Feb 2024 20:25:32 GMT
- Title: Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering
- Authors: Mingda Chen, Xilun Chen, Wen-tau Yih
- Abstract summary: Few-shot learning for open domain multi-hop question answering typically relies on the incontext learning capability of large language models.
We propose a data synthesis framework for multi-hop question answering that requires less than 10 human annotated question answer pairs.
- Score: 40.86455734818704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning for open domain multi-hop question answering typically
relies on the incontext learning capability of large language models (LLMs).
While powerful, these LLMs usually contain tens or hundreds of billions of
parameters, making them rather inefficient at inference time. To improve
performance of smaller language models, we propose a data synthesis framework
for multi-hop question answering that requires less than 10 human annotated
question answer pairs. Our framework depends only on rich, naturally-occurring
relationships among documents and is built upon the data generation functions
parameterized by LLMs and prompts. We synthesize millions of multi-hop
questions and claims to finetune language models, evaluated on popular
benchmarks for multi-hop question answering and fact verification. Empirically,
our approach improves model performance significantly, allowing the finetuned
models to be competitive with GPT-3.5 based approaches while being almost
one-third the size in parameter count.
Related papers
- Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models [37.34801677290571]
FanOutQA is a high-quality dataset of fan-out question-answer pairs and human-annotated decompositions with English Wikipedia.
We formulate three benchmark settings across our dataset and benchmark 7 LLMs, including GPT-4, LLaMA 2, Claude-2.1, and Mixtral-8x7B.
arXiv Detail & Related papers (2024-02-21T20:30:45Z) - MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large
Language Models [70.92847554971065]
We introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities.
By analyzing human-LLM conversations, we categorize interaction patterns into four types: recollection, expansion, refinement, and follow-up.
Our evaluation of 11 well-known LLMs shows that while closed-source models generally surpass open-source ones, certain open-source models exceed GPT-3.5-Turbo in specific tasks.
arXiv Detail & Related papers (2024-01-30T04:50:28Z) - Self-prompted Chain-of-Thought on Large Language Models for Open-domain
Multi-hop Reasoning [70.74928578278957]
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense.
Large language models (LLMs) have found significant utility in facilitating ODQA without external corpus.
We propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs.
arXiv Detail & Related papers (2023-10-20T14:51:10Z) - Multimodal Multi-Hop Question Answering Through a Conversation Between
Tools and Efficiently Finetuned Large Language Models [20.52053559484399]
We employ a tool-interacting divide-and-conquer strategy to answer complex multi-hop questions.
To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset.
To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets.
arXiv Detail & Related papers (2023-09-16T08:22:22Z) - Enhancing In-Context Learning with Answer Feedback for Multi-Span
Question Answering [9.158919909909146]
In this paper, we propose a novel way of employing labeled data such as it informs LLM of some undesired output.
Experiments on three multi-span question answering datasets and a keyphrase extraction dataset show that our new prompting strategy consistently improves LLM's in-context learning performance.
arXiv Detail & Related papers (2023-06-07T15:20:24Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.