Is a Question Decomposition Unit All We Need?
- URL: http://arxiv.org/abs/2205.12538v1
- Date: Wed, 25 May 2022 07:24:09 GMT
- Title: Is a Question Decomposition Unit All We Need?
- Authors: Pruthvi Patel, Swaroop Mishra, Mihir Parmar, Chitta Baral
- Abstract summary: We investigate if humans can decompose a hard question into a set of simpler questions that are relatively easier for models to solve.
We analyze a range of datasets involving various forms of reasoning and find that it is indeed possible to significantly improve model performance.
Our findings indicate that Human-in-the-loop Question Decomposition (HQD) can potentially provide an alternate path to building large LMs.
- Score: 20.66688303609522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LMs) have achieved state-of-the-art performance on
many Natural Language Processing (NLP) benchmarks. With the growing number of
new benchmarks, we build bigger and more complex LMs. However, building new LMs
may not be an ideal option owing to the cost, time and environmental impact
associated with it. We explore an alternative route: can we modify data by
expressing it in terms of the model's strengths, so that a question becomes
easier for models to answer? We investigate if humans can decompose a hard
question into a set of simpler questions that are relatively easier for models
to solve. We analyze a range of datasets involving various forms of reasoning
and find that it is indeed possible to significantly improve model performance
(24% for GPT3 and 29% for RoBERTa-SQuAD along with a symbolic calculator) via
decomposition. Our approach provides a viable option to involve people in NLP
research in a meaningful way. Our findings indicate that Human-in-the-loop
Question Decomposition (HQD) can potentially provide an alternate path to
building large LMs.
Related papers
- Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models [58.57279229066477]
We study how language models (LMs) solve retrieval tasks in diverse situations.
We introduce ORION, a collection of structured retrieval tasks spanning six domains.
We find that LMs internally decompose retrieval tasks in a modular way.
arXiv Detail & Related papers (2023-12-13T18:36:43Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Small Language Models Fine-tuned to Coordinate Larger Language Models
improve Complex Reasoning [41.03267013352519]
Large Language Models (LLMs) prompted to generate chain-of-thought exhibit impressive reasoning capabilities.
We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps.
We show that DaSLaM is not limited by the solver's capabilities as a function of scale.
arXiv Detail & Related papers (2023-10-21T15:23:20Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.