STREET: A Multi-Task Structured Reasoning and Explanation Benchmark
- URL: http://arxiv.org/abs/2302.06729v1
- Date: Mon, 13 Feb 2023 22:34:02 GMT
- Title: STREET: A Multi-Task Structured Reasoning and Explanation Benchmark
- Authors: Danilo Ribeiro, Shen Wang, Xiaofei Ma, Henry Zhu, Rui Dong, Deguang
Kong, Juliette Burger, Anjelica Ramos, William Wang, Zhiheng Huang, George
Karypis, Bing Xiang, Dan Roth
- Abstract summary: We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark.
We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
- Score: 56.555662318619135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce STREET, a unified multi-task and multi-domain natural language
reasoning and explanation benchmark. Unlike most existing question-answering
(QA) datasets, we expect models to not only answer questions, but also produce
step-by-step structured explanations describing how premises in the question
are used to produce intermediate conclusions that can prove the correctness of
a certain answer. We perform extensive evaluation with popular language models
such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models
still lag behind human performance when producing such structured reasoning
steps. We believe this work will provide a way for the community to better
train and test systems on multi-step reasoning and explanations in natural
language.
Related papers
- STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question.
Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts.
We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z) - Leveraging Structured Information for Explainable Multi-hop Question
Answering and Reasoning [14.219239732584368]
In this work, we investigate constructing and leveraging extracted semantic structures (graphs) for multi-hop question answering.
Empirical results and human evaluations show that our framework: generates more faithful reasoning chains and substantially improves the QA performance on two benchmark datasets.
arXiv Detail & Related papers (2023-11-07T05:32:39Z) - Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning
Framework that Supports Diverse Compositional Reasoning [41.99368317059466]
We present Explainable Verbal Reasoner Plus (EVR+), a reasoning framework that enhances language models' compositional reasoning ability.
Our framework supports more diverse types of reasoning such as nested loops and different types of recursion.
Results show that our reasoning framework can enhance the language model's compositional generalization performance on the five tasks.
arXiv Detail & Related papers (2023-04-28T19:27:26Z) - Explanations from Large Language Models Make Small Reasoners Better [61.991772773700006]
We show that our method can consistently and significantly outperform finetuning baselines across different settings.
As a side benefit, human evaluation shows that our method can generate high-quality explanations to justify its predictions.
arXiv Detail & Related papers (2022-10-13T04:50:02Z) - Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning.
A central question is which reasoning examples make the most effective prompts.
We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Faithful Reasoning Using Large Language Models [12.132449274592668]
We show how LMs can be made to perform faithful multi-step reasoning via a process whose causal structure mirrors the underlying logical structure of the problem.
Our approach works by chaining together reasoning steps, where each step results from calls to two fine-tuned LMs.
We demonstrate the effectiveness of our model on multi-step logical deduction and scientific question-answering, showing that it outperforms baselines on final answer accuracy.
arXiv Detail & Related papers (2022-08-30T13:44:41Z) - Teaching Broad Reasoning Skills via Decomposition-Guided Contexts [50.114651561111245]
Question-answering datasets require a broad set of reasoning skills.
We show how to use question decompositions to teach these broad reasoning skills in a robust fashion.
arXiv Detail & Related papers (2022-05-25T05:13:21Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.