Getting MoRE out of Mixture of Language Model Reasoning Experts
- URL: http://arxiv.org/abs/2305.14628v2
- Date: Fri, 20 Oct 2023 05:16:29 GMT
- Title: Getting MoRE out of Mixture of Language Model Reasoning Experts
- Authors: Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, Jordan
Boyd-Graber
- Abstract summary: We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
- Score: 71.61176122960464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent large language models (LLMs) improve on various question
answering (QA) datasets, it remains difficult for a single model to generalize
across question types that require distinct reasoning abilities. We provide
empirical evidence that state-of-the-art LLMs suffer from poor generalizability
on reasoning types beyond those seen in the prompt. To remedy this, we propose
a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse
specialized language models. We specialize the backbone language model with
prompts optimized for different reasoning categories, including factual,
multihop, mathematical, and commonsense reasoning. Our key insight is to
leverage agreement among the specialized experts to select the best answer for
each question, or to abstain from answering. This gives MoRE higher accuracy
than any single specialized model on a collection of 12 QA datasets from four
reasoning types. Beyond generalizability, the interpretable design of MoRE
improves selective question answering results compared to baselines without
incorporating inter-expert agreement. This framework is also more interpretable
and useful to human consumers of QA outputs. Our human study confirms that
presenting expert predictions and the answer selection process helps annotators
more accurately calibrate when to trust the system's output. We release all
code and data to facilitate future work.
Related papers
- SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models [4.328173053224842]
This paper introduces SQuARE, a novel prompting technique designed to improve reasoning through a self-interrogation paradigm.
Building upon CoT frameworks, SQuARE prompts models to generate and resolve multiple auxiliary questions before tackling the main query.
Our evaluations, conducted with Llama 3 and GPT-4o models across multiple question-answering datasets, demonstrate that SQuARE significantly surpasses traditional CoT prompts and existing rephrase-and-respond methods.
arXiv Detail & Related papers (2025-02-13T15:07:20Z) - LRQ-Fact: LLM-Generated Relevant Questions for Multimodal Fact-Checking [14.647261841209767]
We propose a fully-automated framework, LRQ-Fact, for multimodal fact-checking.
It generates comprehensive questions and answers for probing multimodal content.
It then evaluates both the original content and the generated questions and answers to assess the overall veracity.
arXiv Detail & Related papers (2024-10-06T20:33:22Z) - Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question.
Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts.
We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-ofDiscussion framework to leverage the synergy among open-source Large Language Models.
Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Mixture of Experts for Biomedical Question Answering [34.92691831878302]
We propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA.
MoEBQA decouples the computation for different types of questions by sparse routing.
We evaluate MoEBQA on three Biomedical Question Answering (BQA) datasets constructed based on real examinations.
arXiv Detail & Related papers (2022-04-15T14:11:40Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.