Related papers: Enhancing Subtask Performance of Multi-modal Large Language Model

Enhancing Subtask Performance of Multi-modal Large Language Model

URL: http://arxiv.org/abs/2308.16474v1
Date: Thu, 31 Aug 2023 05:37:21 GMT
Title: Enhancing Subtask Performance of Multi-modal Large Language Model
Authors: Yongqiang Zhao, Zhenyu Li, Feng Zhang, Xinhai Xu, Donghong Liu
Abstract summary: Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data. This study selects multiple pre-trained models focused on the same subtask based on distinct evaluation approaches. The results from multiple pre-trained models for the same subtask are compared using the LLM, and the best result is chosen as the outcome for that subtask.
Score: 12.033301861738952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data. Current MLLMs typically begin by using LLMs to decompose tasks into multiple subtasks, then employing individual pre-trained models to complete specific subtasks, and ultimately utilizing LLMs to integrate the results of each subtasks to obtain the results of the task. In real-world scenarios, when dealing with large projects, it is common practice to break down the project into smaller sub-projects, with different teams providing corresponding solutions or results. The project owner then decides which solution or result to use, ensuring the best possible outcome for each subtask and, consequently, for the entire project. Inspired by this, this study considers selecting multiple pre-trained models to complete the same subtask. By combining the results from multiple pre-trained models, the optimal subtask result is obtained, enhancing the performance of the MLLM. Specifically, this study first selects multiple pre-trained models focused on the same subtask based on distinct evaluation approaches, and then invokes these models in parallel to process input data and generate corresponding subtask results. Finally, the results from multiple pre-trained models for the same subtask are compared using the LLM, and the best result is chosen as the outcome for that subtask. Extensive experiments are conducted in this study using GPT-4 annotated datasets and human-annotated datasets. The results of various evaluation metrics adequately demonstrate the effectiveness of the proposed approach in this paper.

Related papers

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning [69.7347209018861]
We introduce MLLM-Selector, an automated approach that identifies valuable data for visual instruction tuning. We calculate necessity scores for each sample in the VIT data pool to identify samples pivotal for enhancing model performance. Our findings underscore the importance of mixing necessity and diversity in data choice, leading to the creation of MLLM-Selector.
arXiv Detail & Related papers (2025-03-26T12:42:37Z)
P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning. Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks. We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks. We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z)
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks. However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used. We introduce SelectLLM, a novel algorithm that directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z)
UniDM: A Unified Framework for Data Manipulation with Large Language Models [66.61466011795798]
Large Language Models (LLMs) resolve multiple data manipulation tasks. LLMs exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks.
arXiv Detail & Related papers (2024-05-10T14:44:04Z)
On Inter-dataset Code Duplication and Data Leakage in Large Language Models [4.148857672591562]
This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating large language models (LLMs) Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon. We provide evidence that open-source models could be affected by inter-dataset duplication.
arXiv Detail & Related papers (2024-01-15T19:46:40Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Large Language Model Routing with Benchmark Datasets [40.42044096089315]
No single model typically achieves the best accuracy in all tasks and use cases. We propose a new formulation for the problem, in which benchmark datasets are repurposed to learn a "router" model for this selection. We show that this problem can be reduced to a collection of binary classification tasks.
arXiv Detail & Related papers (2023-09-27T17:08:40Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.