AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit
- URL: http://arxiv.org/abs/2409.13447v2
- Date: Mon, 23 Sep 2024 08:43:06 GMT
- Title: AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit
- Authors: Mohanna Hoveyda, Arjen P. de Vries, Maarten de Rijke, Harrie Oosterhuis, Faegheh Hasibi,
- Abstract summary: In question answering (QA), different questions can be effectively addressed with different answering strategies.
We develop a dynamic method that adaptively selects the most suitable QA strategy for each question.
Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules.
- Score: 59.10281630985958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In question answering (QA), different questions can be effectively addressed with different answering strategies. Some require a simple lookup, while others need complex, multi-step reasoning to be answered adequately. This observation motivates the development of a dynamic method that adaptively selects the most suitable QA strategy for each question, enabling more efficient and effective systems capable of addressing a broader range of question types. To this aim, we build on recent advances in the orchestration of multiple large language models (LLMs) and formulate adaptive QA as a dynamic orchestration challenge. We define this as a contextual multi-armed bandit problem, where the context is defined by the characteristics of the incoming question and the action space consists of potential communication graph configurations among the LLM agents. We then train a linear upper confidence bound model to learn an optimal mapping between different question types and their corresponding optimal multi-LLM communication graph representation. Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules, as it combines the superior performance of more complex strategies while avoiding their costs when simpler strategies suffice.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Optimal Decision Making Through Scenario Simulations Using Large Language Models [0.0]
Large Language Models (LLMs) have transformed how complex problems are approached and solved.
This paper proposes an innovative approach to bridge this capability gap.
By enabling LLMs to request multiple potential options and their respective parameters from users, our system introduces a dynamic framework.
This function is designed to analyze the provided options, simulate potential outcomes, and determine the most advantageous solution.
arXiv Detail & Related papers (2024-07-09T01:23:09Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-of-Discussion framework to leverage the synergy among open-source Large Language Models.
Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - In-Context Ability Transfer for Question Decomposition in Complex QA [6.745884231594893]
We propose icat (In-Context Ability Transfer) to solve complex question-answering tasks.
We transfer the ability to decompose complex questions to simpler questions or generate step-by-step rationales to LLMs.
We conduct large-scale experiments on a variety of complex QA tasks involving numerical reasoning, compositional complex QA, and heterogeneous complex QA.
arXiv Detail & Related papers (2023-10-26T11:11:07Z) - Adaptive-Solver Framework for Dynamic Strategy Selection in Large
Language Model Reasoning [34.568072559937455]
Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks.
Most methodologies that leverage LLMs tend to adopt a uniform approach.
Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance.
We introduce an Adaptive-r framework that strategically modulates solving strategies based on the difficulties of the problems.
arXiv Detail & Related papers (2023-10-01T12:28:36Z) - How Many Answers Should I Give? An Empirical Study of Multi-Answer
Reading Comprehension [64.76737510530184]
We design a taxonomy to categorize commonly-seen multi-answer MRC instances.
We analyze how well different paradigms of current multi-answer MRC models deal with different types of multi-answer instances.
arXiv Detail & Related papers (2023-06-01T08:22:21Z) - Active Prompting with Chain-of-Thought for Large Language Models [26.5029080638055]
This paper proposes a new method, Active-Prompt, to adapt large language models to different tasks.
By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty.
Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks.
arXiv Detail & Related papers (2023-02-23T18:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.