RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
- URL: http://arxiv.org/abs/2409.19886v1
- Date: Mon, 30 Sep 2024 02:31:40 GMT
- Title: RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
- Authors: Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang,
- Abstract summary: We show that query-based Router by Dual Contrastive learning (DC) is effective in assembling large language models (LLMs)
DC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution and out-of-distribution tasks.
- Score: 24.113223576205932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.
Related papers
- MasRouter: Learning to Route LLMs for Multi-Agent Systems [14.029698552632107]
Multi-agent systems powered by Large Language Models (LLMs) have been demonstrated to push the boundaries of LLM capabilities.
Current routing methods effectively reduce overhead in single-agent scenarios by customizing LLM selection for each query.
We first introduce the problem of Multi-Agent Routing System (MASR), which integrates all components of MAS into a unified routing framework.
Mas is (1) high-performing, achieving a $1.8%sim8.2%$ improvement over the state-of-the-art method on MBPP; (2) economical, reducing overhead by up to $52.07%$ compared to S
arXiv Detail & Related papers (2025-02-16T14:00:59Z) - Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.
We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts.
We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models [56.93608812478369]
We present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization.
L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks.
Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.
arXiv Detail & Related papers (2024-08-16T23:57:29Z) - Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data.
We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters.
To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z) - RouterBench: A Benchmark for Multi-LLM Routing System [25.515453832224804]
No single model can optimally address all tasks and applications, particularly when balancing performance with cost.
This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs.
We present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems.
arXiv Detail & Related papers (2024-03-18T17:59:04Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning
in Encrypted Traffic Classification [68.19713459228369]
We compare transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models.
We show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology.
While ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.
arXiv Detail & Related papers (2023-05-21T11:20:49Z) - StableMoE: Stable Routing Strategy for Mixture of Experts [109.0602120199226]
Mixture-of-Experts (MoE) technique can scale up the model size of Transformers with an affordable computational overhead.
We propose StableMoE with two training stages to address the routing fluctuation problem.
Results show that StableMoE outperforms existing MoE methods in terms of both convergence speed and performance.
arXiv Detail & Related papers (2022-04-18T16:48:19Z) - Boosting Share Routing for Multi-task Learning [0.12891210250935145]
Multi-task learning (MTL) aims to make full use of the knowledge contained in multi-task supervision signals to improve the overall performance.
How to make the knowledge of multiple tasks shared appropriately is an open problem for MTL.
We propose a general framework called Multi-Task Neural Architecture Search (MTNAS) to efficiently find a suitable sharing route for a given MTL problem.
arXiv Detail & Related papers (2020-09-01T12:37:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.