Related papers: FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data

FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data

URL: http://arxiv.org/abs/2507.10540v2
Date: Sat, 27 Sep 2025 22:59:32 GMT
Title: FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data
Authors: Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jiaxuan You,
Abstract summary: Large language models (LLMs) have a diverse landscape of models, each excelling at different tasks.<n>This diversity drives researchers to employ multiple LLMs in practice, leaving behind valuable multi-LLM log data.<n>We argue that practical fusion must meet two essential requirements: (1) compatibility with real-world serving scenarios (e.g., local and API-based serving), and (2) flexibility to operate at different stages of the LLM pipeline to meet varied user needs.
Score: 60.09659670497899
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of large language models (LLMs) has created a diverse landscape of models, each excelling at different tasks. This diversity drives researchers to employ multiple LLMs in practice, leaving behind valuable multi-LLM log data. This naturally leads to the question of whether such logs can be fully leveraged to fuse LLMs' complementary capabilities. Although prior work has explored various strategies for integrating multiple LLMs, we argue that practical fusion must meet two essential requirements: (1) compatibility with real-world serving scenarios (e.g., local and API-based serving), and (2) flexibility to operate at different stages of the LLM pipeline to meet varied user needs (e.g., fine-tuning and inference stages). To this end, we introduce LLMFusionBench, a large-scale benchmark for LLM fusion that spans 14 tasks across five domains, with responses from 20 open-source LLMs (8B--671B) totaling 103M tokens. Building on LLMFusionBench, we propose FusionFactory, a systematic framework with three elaborated levels: (1) query-level fusion via tailored LLM routers, (2) thought-level fusion leveraging retrieved abstract reasoning templates, and (3) model-level fusion via distillation from top-ranked responses. Experiments show that FusionFactory consistently outperforms the best individual LLM across all 14 benchmarks, with the optimal fusion configuration varying across benchmarks, highlighting the promise of multi-LLM log data as a practical foundation for fusing diverse LLM capabilities.

Related papers

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling [33.90190836855285]
MMER integrates existing MLLMs for effective multimodal expansion while retaining their original performance.<n>MMER can also mitigate catastrophic forgetting by applying a similar process to MLLMs fine-tuned on new tasks.
arXiv Detail & Related papers (2025-05-21T12:40:07Z)
QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines [4.94507535566914]
We show that combining two distinct small language models (SLMs) with different architectures can outperform large language models (LLMs) in relevance assessment.<n>Our approach -- QUPID -- integrates a generative SLM with an embedding-based SLM, achieving higher relevance judgment accuracy.
arXiv Detail & Related papers (2025-05-12T08:35:09Z)
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data [56.08867996209236]
Fine-tuning Multimodal Large Language Models (MLLMs) with Federated Learning (FL) allows for expanding the training data scope by including private data sources.<n>We introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios.<n>We develop a general FedMLLM framework that integrates classic FL methods alongside two modality-agnostic strategies.
arXiv Detail & Related papers (2024-11-22T04:09:23Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity [7.945893812374361]
We introduce the focal diversity metric to capture the diversity-performance correlation among component LLMs of an ensemble. We develop a diversity-optimized ensemble pruning algorithm to select the top-k sub-ensembles from a pool of $N$ base LLMs. Our pruning method recommends top-performing LLM subensembles of size $S$, often much smaller than $N$.
arXiv Detail & Related papers (2024-10-04T22:31:15Z)
NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks. We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities. We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z)
TensorOpera Router: A Multi-Model Router for Efficient LLM Inference [27.2803289964386]
TO-lemma is a non-monolithic LLM querying system. It seamlessly integrates various LLM experts into a single query interface. It dynamically routes incoming queries to the most high-performant expert based on query's requirements.
arXiv Detail & Related papers (2024-08-22T11:57:07Z)
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement [72.97553348776425]
We make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs. We introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope. We merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales.
arXiv Detail & Related papers (2024-08-06T10:46:46Z)
SoupLM: Model Integration in Large Language and Multi-Modal Models [51.12227693121004]
Training large language models (LLMs) requires significant computing resources. Existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
arXiv Detail & Related papers (2024-07-11T05:38:15Z)
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization [18.73637736606997]
Pack of LLMs (PackLLM) is an effective method for test-time fusion that leverages each LLM's expertise, given an input prompt. We conduct experiments with over 100 total Large Language Models (LLMs) on a diverse set of tasks. PackLLM outperforms test-time fusion baselines by 1.89% accuracy points.
arXiv Detail & Related papers (2024-04-17T16:24:07Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
Generative Multimodal Entity Linking [24.322540112710918]
Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to referent entities from a knowledge base. Existing MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters. We propose GEMEL, a Generative Multimodal Entity Linking framework based on Large Language Models (LLMs) Our framework is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution.
arXiv Detail & Related papers (2023-06-22T07:57:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.