Getting MoRE out of Mixture of Language Model Reasoning Experts
- URL: http://arxiv.org/abs/2305.14628v2
- Date: Fri, 20 Oct 2023 05:16:29 GMT
- Title: Getting MoRE out of Mixture of Language Model Reasoning Experts
- Authors: Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan
Boyd-Graber
- Abstract summary: We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
- Score: 71.61176122960464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent large language models (LLMs) improve on various question
answering (QA) datasets, it remains difficult for a single model to generalize
across question types that require distinct reasoning abilities. We provide
empirical evidence that state-of-the-art LLMs suffer from poor generalizability
on reasoning types beyond those seen in the prompt. To remedy this, we propose
a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse
specialized language models. We specialize the backbone language model with
prompts optimized for different reasoning categories, including factual,
multihop, mathematical, and commonsense reasoning. Our key insight is to
leverage agreement among the specialized experts to select the best answer for
each question, or to abstain from answering. This gives MoRE higher accuracy
than any single specialized model on a collection of 12 QA datasets from four
reasoning types. Beyond generalizability, the interpretable design of MoRE
improves selective question answering results compared to baselines without
incorporating inter-expert agreement. This framework is also more interpretable
and useful to human consumers of QA outputs. Our human study confirms that
presenting expert predictions and the answer selection process helps annotators
more accurately calibrate when to trust the system's output. We release all
code and data to facilitate future work.
Related papers
- LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking [14.647261841209767]
We propose a fully-automated framework, LRQ-Fact, for multimodal fact-checking.
It generates comprehensive questions and answers for probing multimodal content.
It then evaluates both the original content and the generated questions and answers to assess the overall veracity.
arXiv Detail & Related papers (2024-10-06T20:33:22Z) - Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question.
Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts.
We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - A Study on Large Language Models' Limitations in Multiple-Choice Question Answering [0.0]
We analyze 26 small open-source models and find that 65% of the models do not understand the task.
Only 4 models properly select an answer from the given choices, and only 5 of these models are choice order independent.
arXiv Detail & Related papers (2024-01-15T20:42:16Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.