Tryage: Real-time, intelligent Routing of User Prompts to Large Language
Models
- URL: http://arxiv.org/abs/2308.11601v2
- Date: Wed, 23 Aug 2023 17:34:17 GMT
- Title: Tryage: Real-time, intelligent Routing of User Prompts to Large Language
Models
- Authors: Surya Narayanan Hari, Matt Thomson
- Abstract summary: With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted and data domains.
Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library.
- Score: 1.0878040851637998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The introduction of the transformer architecture and the self-attention
mechanism has led to an explosive production of language models trained on
specific downstream tasks and data domains. With over 200, 000 models in the
Hugging Face ecosystem, users grapple with selecting and optimizing models to
suit multifaceted workflows and data domains while addressing computational,
security, and recency concerns. There is an urgent need for machine learning
frameworks that can eliminate the burden of model selection and customization
and unleash the incredible power of the vast emerging model library for end
users. Here, we propose a context-aware routing system, Tryage, that leverages
a language model router for optimal selection of expert models from a model
library based on analysis of individual input prompts. Inspired by the thalamic
router in the brain, Tryage employs a perceptive router to predict down-stream
model performance on prompts and, then, makes a routing decision using an
objective function that integrates performance predictions with user goals and
constraints that are incorporated through flags (e.g., model size, model
recency). Tryage allows users to explore a Pareto front and automatically
trade-off between task accuracy and secondary goals including minimization of
model size, recency, security, verbosity, and readability. Across heterogeneous
data sets that include code, text, clinical data, and patents, the Tryage
framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection
identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by
GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how
routing models can be applied to program and control the behavior of
multi-model LLM systems to maximize efficient use of the expanding and evolving
language model ecosystem.
Related papers
- Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a novel sandbox suite tailored for integrated data-model co-development.
This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models.
We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP Values [17.489279048199304]
REFRESH is a method to reselect features so that additional constraints that are desirable towards model performance can be achieved without having to train several new models.
REFRESH's underlying algorithm is a novel technique using SHAP values and correlation analysis that can approximate for the predictions of a model without having to train these models.
arXiv Detail & Related papers (2024-03-13T18:06:43Z) - Budgeted Online Model Selection and Fine-Tuning via Federated Learning [26.823435733330705]
Online model selection involves selecting a model from a set of candidate models 'on the fly' to perform prediction on a stream of data.
The choice of candidate models henceforth has a crucial impact on the performance.
The present paper proposes an online federated model selection framework where a group of learners (clients) interacts with a server with sufficient memory.
Using the proposed algorithm, clients and the server collaborate to fine-tune models to adapt them to a non-stationary environment.
arXiv Detail & Related papers (2024-01-19T04:02:49Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - Green Runner: A tool for efficient model selection from model
repositories [3.0378875015087563]
GreenRunnerGPT is a novel tool for selecting deep learning models based on specific use cases.
It employs a large language model to suggest weights for quality indicators, optimizing resource utilization.
We demonstrate that GreenRunnerGPT is able to identify a model suited to a target use case without wasteful computations.
arXiv Detail & Related papers (2023-05-26T12:00:37Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Error Detection in Large-Scale Natural Language Understanding Systems
Using Transformer Models [0.0]
Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition.
We address this challenge to detect domain classification errors using offline Transformer models.
We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors.
arXiv Detail & Related papers (2021-09-04T00:10:48Z) - Model Selection for Cross-Lingual Transfer [15.197350103781739]
We propose a machine learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities.
In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages.
arXiv Detail & Related papers (2020-10-13T02:36:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.