Related papers: ICL-Router: In-Context Learned Model Representations for LLM Routing

ICL-Router: In-Context Learned Model Representations for LLM Routing

URL: http://arxiv.org/abs/2510.09719v2
Date: Tue, 14 Oct 2025 06:58:41 GMT
Title: ICL-Router: In-Context Learned Model Representations for LLM Routing
Authors: Chenxu Wang, Hao Li, Yiqun Zhang, Linyao Chen, Jianhao Chen, Ping Jian, Peng Ye, Qiaosheng Zhang, Shuyue Hu,
Abstract summary: We propose a novel routing method using in-context vectors to represent model capabilities.<n>Our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks.
Score: 30.759446235510467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router's semantic space. Second, each candidate model is profiled on a query set, and the router learns -- based on in-context vectors of query and model performance -- to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router. The code is available at https://github.com/lalalamdbf/ICL-Router.

Related papers

HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning [11.03159148013318]
Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs.<n>We propose Hier, a hierarchical routing approach that dynamically assembles inference pipelines from a pool of specialized, lightweight language models.
arXiv Detail & Related papers (2025-11-13T02:12:14Z)
Lookahead Routing for Large Language Models [24.082620717301477]
Lookahead is a routing framework that "foresees" potential model outputs and uses these predictions to guide model selection.<n> Empirical evaluations across seven public benchmarks show that Lookahead consistently outperforms existing routing baselines.
arXiv Detail & Related papers (2025-10-22T12:00:21Z)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Arch-Router: Aligning LLM Routing with Human Preferences [1.859931123372708]
routing has become an essential technique to operationalize the use of different models.<n>We propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains.<n>Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible.
arXiv Detail & Related papers (2025-06-19T23:57:41Z)
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [12.878608250420832]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z)
Query Routing for Retrieval-Augmented Language Models [38.05904245087491]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLM's ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z)
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing [64.38277118982698]
Large Language Models (LLMs) have demonstrated human-like instruction-following abilities.<n>In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance.<n>We develop a new paradigm, constructing capability instructions with model capability representation, user instruction, and performance inquiry prompts to assess the performance.
arXiv Detail & Related papers (2025-02-24T16:10:53Z)
Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models [24.113223576205932]
We show that query-based Router by Dual Contrastive learning (DC) is effective in assembling large language models (LLMs) DC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution and out-of-distribution tasks.
arXiv Detail & Related papers (2024-09-30T02:31:40Z)
RouterRetriever: Routing over a Mixture of Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism.<n> RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.