ICL-Router: In-Context Learned Model Representations for LLM Routing
- URL: http://arxiv.org/abs/2510.09719v2
- Date: Tue, 14 Oct 2025 06:58:41 GMT
- Title: ICL-Router: In-Context Learned Model Representations for LLM Routing
- Authors: Chenxu Wang, Hao Li, Yiqun Zhang, Linyao Chen, Jianhao Chen, Ping Jian, Peng Ye, Qiaosheng Zhang, Shuyue Hu,
- Abstract summary: We propose a novel routing method using in-context vectors to represent model capabilities.<n>Our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks.
- Score: 30.759446235510467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router's semantic space. Second, each candidate model is profiled on a query set, and the router learns -- based on in-context vectors of query and model performance -- to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router. The code is available at https://github.com/lalalamdbf/ICL-Router.
Related papers
- HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning [11.03159148013318]
Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs.<n>We propose Hier, a hierarchical routing approach that dynamically assembles inference pipelines from a pool of specialized, lightweight language models.
arXiv Detail & Related papers (2025-11-13T02:12:14Z) - Lookahead Routing for Large Language Models [24.082620717301477]
Lookahead is a routing framework that "foresees" potential model outputs and uses these predictions to guide model selection.<n> Empirical evaluations across seven public benchmarks show that Lookahead consistently outperforms existing routing baselines.
arXiv Detail & Related papers (2025-10-22T12:00:21Z) - Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z) - Arch-Router: Aligning LLM Routing with Human Preferences [1.859931123372708]
routing has become an essential technique to operationalize the use of different models.<n>We propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains.<n>Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible.
arXiv Detail & Related papers (2025-06-19T23:57:41Z) - Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [12.878608250420832]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z) - Query Routing for Retrieval-Augmented Language Models [38.05904245087491]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLM's ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z) - OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing [64.38277118982698]
Large Language Models (LLMs) have demonstrated human-like instruction-following abilities.<n>In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance.<n>We develop a new paradigm, constructing capability instructions with model capability representation, user instruction, and performance inquiry prompts to assess the performance.
arXiv Detail & Related papers (2025-02-24T16:10:53Z) - Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models [24.113223576205932]
We show that query-based Router by Dual Contrastive learning (DC) is effective in assembling large language models (LLMs)
DC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution and out-of-distribution tasks.
arXiv Detail & Related papers (2024-09-30T02:31:40Z) - RouterRetriever: Routing over a Mixture of Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism.<n> RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.