HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search
- URL: http://arxiv.org/abs/2601.05903v1
- Date: Fri, 09 Jan 2026 16:22:25 GMT
- Title: HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search
- Authors: Zihang Tian, Rui Li, Jingsen Zhang, Xiaohe Bo, Wei Huo, Xu Chen,
- Abstract summary: Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks.<n>We introduce HAPS, a hierarchical LLM routing framework that jointly searches over model architectures and parameters.<n> Experiments on two commonly used benchmarks show that HAPS consistently outperforms strong routing baselines.
- Score: 13.177031415523302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures while overlooking parameter settings, which are critical for task performance. In this paper, we introduce HAPS, a hierarchical LLM routing framework that jointly searches over model architectures and parameters. Specifically, we use a high-level router to select among candidate LLM architectures, and then search for the optimal parameters for the selected architectures based on a low-level router. We design a parameter generation network to share parameters between the two routers to mutually enhance their capabilities. In the training process, we design a reward-augmented objective to effectively optimize our framework. Experiments on two commonly used benchmarks show that HAPS consistently outperforms strong routing baselines. We have released our code at https://github.com/zihangtian/HAPS.
Related papers
- Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z) - LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing [44.046399484829635]
Large language model (LLM) routing assigns each query to the most suitable model from an ensemble.<n>We introduce LLMBench, a large-scale benchmark and unified framework for LLM routing.<n>It comprises over 400K instances from 21 datasets and 33 models.
arXiv Detail & Related papers (2026-01-12T05:01:15Z) - HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning [11.03159148013318]
Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs.<n>We propose Hier, a hierarchical routing approach that dynamically assembles inference pipelines from a pool of specialized, lightweight language models.
arXiv Detail & Related papers (2025-11-13T02:12:14Z) - LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding [55.5535016040221]
LM-Searcher is a novel framework for cross-domain neural architecture optimization.<n>Central to our approach is NCode, a universal numerical string representation for neural architectures.<n>Our dataset, encompassing a wide range of architecture-performance pairs, encourages robust and transferable learning.
arXiv Detail & Related papers (2025-09-06T09:26:39Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing [27.481573948464987]
Radial is a novel framework for large language models routing.<n>It uses a lightweight Transformer-based backbone with a radial structure named RadialFormer to articulate the query-LLMs relationship.<n>It significantly outperforms existing routing methods by 9.2% and 5.8% in the Balance and Cost First scenarios.
arXiv Detail & Related papers (2025-06-04T12:16:41Z) - RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models [45.58601993849455]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z) - Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - RouterBench: A Benchmark for Multi-LLM Routing System [25.515453832224804]
No single model can optimally address all tasks and applications, particularly when balancing performance with cost.
This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs.
We present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems.
arXiv Detail & Related papers (2024-03-18T17:59:04Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.