Large Language Model Routing with Benchmark Datasets
- URL: http://arxiv.org/abs/2309.15789v1
- Date: Wed, 27 Sep 2023 17:08:40 GMT
- Title: Large Language Model Routing with Benchmark Datasets
- Authors: Tal Shnitzer, Anthony Ou, M\'irian Silva, Kate Soule, Yuekai Sun,
Justin Solomon, Neil Thompson, Mikhail Yurochkin
- Abstract summary: No single model typically achieves the best accuracy in all tasks and use cases.
We propose a new formulation for the problem, in which benchmark datasets are repurposed to learn a "router" model for this selection.
We show that this problem can be reduced to a collection of binary classification tasks.
- Score: 40.42044096089315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a rapidly growing number of open-source Large Language Models (LLMs)
and benchmark datasets to compare them. While some models dominate these
benchmarks, no single model typically achieves the best accuracy in all tasks
and use cases. In this work, we address the challenge of selecting the best LLM
out of a collection of models for new tasks. We propose a new formulation for
the problem, in which benchmark datasets are repurposed to learn a "router"
model for this LLM selection, and we show that this problem can be reduced to a
collection of binary classification tasks. We demonstrate the utility and
limitations of learning model routers from various benchmark datasets, where we
consistently improve performance upon using any single model for all tasks.
Related papers
- Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.
Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture.
Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets [0.6144680854063939]
We introduce a benchmark aimed at better characterizing types of datasets where Deep Learning models excel.
We evaluate 111 datasets with 20 different models, including both regression and classification tasks.
Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy.
arXiv Detail & Related papers (2024-08-27T06:58:52Z) - Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM
Evaluation [51.99752147380505]
This paper presents a benchmark self-evolving framework to dynamically evaluate Large Language Models (LLMs)
We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence.
Our framework widens performance discrepancies both between different models and within the same model across various tasks.
arXiv Detail & Related papers (2024-02-18T03:40:06Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - GistScore: Learning Better Representations for In-Context Example
Selection with Gist Bottlenecks [3.9638110494107095]
In-context Learning (ICL) is the ability of Large Language Models (LLMs) to perform new tasks when conditioned on prompts.
We propose Example Gisting, a novel approach for training example encoders through supervised fine-tuning.
We show that our fine-tuned models get state-of-the-art ICL performance with over 20% absolute gain over off-the-shelf retrievers.
arXiv Detail & Related papers (2023-11-16T06:28:05Z) - Enhancing Subtask Performance of Multi-modal Large Language Model [12.033301861738952]
Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data.
This study selects multiple pre-trained models focused on the same subtask based on distinct evaluation approaches.
The results from multiple pre-trained models for the same subtask are compared using the LLM, and the best result is chosen as the outcome for that subtask.
arXiv Detail & Related papers (2023-08-31T05:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.