Elaboration-Generating Commonsense Question Answering at Scale
- URL: http://arxiv.org/abs/2209.01232v2
- Date: Fri, 14 Jul 2023 21:43:36 GMT
- Title: Elaboration-Generating Commonsense Question Answering at Scale
- Authors: Wenya Wang, Vivek Srikumar, Hanna Hajishirzi, Noah A. Smith
- Abstract summary: In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge.
We finetune smaller language models to generate useful intermediate context, referred to here as elaborations.
Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
- Score: 77.96137534751445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In question answering requiring common sense, language models (e.g., GPT-3)
have been used to generate text expressing background knowledge that helps
improve performance. Yet the cost of working with such models is very high; in
this work, we finetune smaller language models to generate useful intermediate
context, referred to here as elaborations. Our framework alternates between
updating two language models -- an elaboration generator and an answer
predictor -- allowing each to influence the other. Using less than 0.5% of the
parameters of GPT-3, our model outperforms alternatives with similar sizes and
closes the gap on GPT-3 on four commonsense question answering benchmarks.
Human evaluations show that the quality of the generated elaborations is high.
Related papers
- Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging [25.078498180620425]
We present a step-by-step evaluation framework, textbfFennec, capable of textbfFine-grained textbfEvaluatiotextbfN textbfExtended through brantextbfChing and bridging.
We employ the fine-grained correction capabilities induced by the evaluation model to refine multiple model responses, leading to an improvement of 1-2 points on the MT-Bench.
arXiv Detail & Related papers (2024-05-20T16:47:22Z) - Negated Complementary Commonsense using Large Language Models [3.42658286826597]
This work focuses on finding answers to negated complementary questions in commonsense scenarios.
We propose a model-agnostic methodology to improve the performance in negated complementary scenarios.
arXiv Detail & Related papers (2023-07-13T15:03:48Z) - PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse
Heterogeneous Computing [64.53242758625922]
PanGu-Sigma is trained on a cluster of Ascend 910 AI processors and MindSpore framework.
It provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks.
arXiv Detail & Related papers (2023-03-20T03:39:27Z) - STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z) - Emergent Analogical Reasoning in Large Language Models [1.5469452301122177]
We show that GPT-3 has a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings.
Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
arXiv Detail & Related papers (2022-12-19T00:04:56Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - It's Not Just Size That Matters: Small Language Models Are Also Few-Shot
Learners [14.264737570114631]
We show that performance similar to GPT-3 can be obtained with language models that are much "greener"
We identify key factors required for successful natural language understanding with small language models.
arXiv Detail & Related papers (2020-09-15T14:18:53Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.