Multimodal Multi-Hop Question Answering Through a Conversation Between
Tools and Efficiently Finetuned Large Language Models
- URL: http://arxiv.org/abs/2309.08922v1
- Date: Sat, 16 Sep 2023 08:22:22 GMT
- Title: Multimodal Multi-Hop Question Answering Through a Conversation Between
Tools and Efficiently Finetuned Large Language Models
- Authors: Hossein Rajabzadeh, Suyuchen Wang, Hyock Ju Kwon, Bang Liu
- Abstract summary: We employ a tool-interacting divide-and-conquer strategy to answer complex multi-hop questions.
To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset.
To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets.
- Score: 20.52053559484399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We employ a tool-interacting divide-and-conquer strategy enabling large
language models (LLMs) to answer complex multimodal multi-hop questions. In
particular, we harness the power of large language models to divide a given
multimodal multi-hop question into unimodal single-hop sub-questions to be
answered by the appropriate tool from a predefined set of tools. After all
corresponding tools provide the LLM with their answers, the LLM generates the
next relevant unimodal single-hop question. To increase the reasoning ability
of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer
dataset. This dataset is then used to efficiently finetune the corresponding
LLM. To assess the effectiveness of this approach, we conduct an evaluation on
two recently introduced complex question-answering datasets. The experimental
analysis demonstrate substantial improvements over existing state-of-the-art
solutions, indicating the efficacy and generality of our strategy
Related papers
- QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models [0.42832989850721054]
Multimodal Entities Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to referent entities in a multimodal knowledge base, such as Wikipedia.
Existing methods overcomplicate the MEL task and overlook the visual semantic information, which makes them costly and hard to scale.
We propose UniMEL, a unified framework which establishes a new paradigm to process multimodal entity linking tasks using Large Language Models.
arXiv Detail & Related papers (2024-07-23T03:58:08Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - From Good to Great: Improving Math Reasoning with Tool-Augmented
Interleaf Prompting [45.77084082197953]
IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting.
We introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting.
arXiv Detail & Related papers (2023-12-18T06:31:23Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Beyond Text: Unveiling Multimodal Proficiency of Large Language Models
with MultiAPI Benchmark [11.572835837392867]
This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset.
It consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks.
Our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation.
arXiv Detail & Related papers (2023-11-21T23:26:05Z) - Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models [79.32652077838046]
We introduce Parrot, a solution aiming to enhance multi-turn instruction following for large language models (LLMs)
First, we introduce an efficient but effective method for collecting multi-turn instructions that feature human-like queries, such as anaphora and ellipsis.
Second, we propose a context-aware preference optimization strategy to further enhance LLMs for complex queries in multi-turn interaction.
arXiv Detail & Related papers (2023-10-11T08:36:43Z) - On the Performance of Multimodal Language Models [4.677125897916577]
This study conducts a comparative analysis of different multimodal instruction tuning approaches.
We reveal key insights for guiding architectural choices when incorporating multimodal capabilities into large language models.
arXiv Detail & Related papers (2023-10-04T23:33:36Z) - MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
Feedback [78.60644407028022]
We introduce MINT, a benchmark that evaluates large language models' ability to solve tasks with multi-turn interactions.
LLMs generally benefit from tools and language feedback, with performance gains of 1-8% for each turn of tool use.
LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities.
arXiv Detail & Related papers (2023-09-19T15:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.