LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and
Generative Fusion
- URL: http://arxiv.org/abs/2306.02561v3
- Date: Fri, 30 Jun 2023 21:39:54 GMT
- Title: LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and
Generative Fusion
- Authors: Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
- Abstract summary: Our framework consists of two modules: PairRanker and GenFuser.
PairRanker employs a specialized pairwise comparison method to distinguish subtle differences between candidate outputs.
GenFuser aims to merge the top-ranked candidates, generating an improved output.
- Score: 33.73671362609599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present LLM-Blender, an ensembling framework designed to attain
consistently superior performance by leveraging the diverse strengths of
multiple open-source large language models (LLMs). Our framework consists of
two modules: PairRanker and GenFuser, addressing the observation that optimal
LLMs for different examples can significantly vary. PairRanker employs a
specialized pairwise comparison method to distinguish subtle differences
between candidate outputs. It jointly encodes the input text and a pair of
candidates, using cross-attention encoders to determine the superior one. Our
results demonstrate that PairRanker exhibits the highest correlation with
ChatGPT-based ranking. Then, GenFuser aims to merge the top-ranked candidates,
generating an improved output by capitalizing on their strengths and mitigating
their weaknesses. To facilitate large-scale evaluation, we introduce a
benchmark dataset, MixInstruct, which is a mixture of multiple instruction
datasets featuring oracle pairwise comparisons. Our LLM-Blender significantly
outperform individual LLMs and baseline methods across various metrics,
establishing a substantial performance gap.
Related papers
- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.
Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture.
Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.
Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z) - PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars [1.450405446885067]
Self-ensembling techniques with diverse reasoning paths have demonstrated remarkable performance gains in text generation with Large Language Models (LLMs)
We introduce PEDAL, a hybrid self-ensembling approach that combines the strengths of diverse exemplar based prompts and LLM based aggregation to achieve improvement in overall performance.
arXiv Detail & Related papers (2024-08-16T17:54:09Z) - SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models [8.558834738072363]
Large language models (LLMs) have gained increased popularity due to their remarkable success across various tasks.
However, individual LLMs have limitations when applied to complex tasks because of such factors as training biases, model sizes, and the datasets used.
We introduce SelectLLM, a novel algorithm that directs input queries to the most suitable subset of LLMs from a large pool.
arXiv Detail & Related papers (2024-08-16T06:11:21Z) - Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching [47.01589023992927]
We design a compound entity matching framework (ComEM) that leverages the composition of multiple strategies and large language models (LLMs)
ComEM benefits from the advantages of different sides and achieves improvements in both effectiveness and efficiency.
Experimental results on 8 ER datasets and 9 LLMs verify the superiority of incorporating record interactions through the selecting strategy.
arXiv Detail & Related papers (2024-05-27T07:05:27Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs.
This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z) - Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function.
We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.