Related papers: Arch-Router: Aligning LLM Routing with Human Preferences

Arch-Router: Aligning LLM Routing with Human Preferences

URL: http://arxiv.org/abs/2506.16655v1
Date: Thu, 19 Jun 2025 23:57:41 GMT
Title: Arch-Router: Aligning LLM Routing with Human Preferences
Authors: Co Tran, Salman Paracha, Adil Hafeez, Shuguang Chen,
Abstract summary: routing has become an essential technique to operationalize the use of different models.<n>We propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains.<n>Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible.
Score: 1.859931123372708
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid proliferation of large language models (LLMs) -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to capture human preferences driven by subjective evaluation criteria, and they typically select from a limited pool of models. In this work, we propose a preference-aligned routing framework that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing) -- offering a practical mechanism to encode preferences in routing decisions. Specifically, we introduce \textbf{Arch-Router}, a compact 1.5B model that learns to map queries to domain-action preferences for model routing decisions. Our approach also supports seamlessly adding new models for routing without requiring retraining or architectural modifications. Experiments on conversational datasets demonstrate that our approach achieves state-of-the-art (SOTA) results in matching queries with human preferences, outperforming top proprietary models. Our approach captures subjective evaluation criteria and makes routing decisions more transparent and flexible. Our model is available at: \texttt{https://huggingface.co/katanemo/Arch-Router-1.5B}.

Related papers

Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Query Routing for Retrieval-Augmented Language Models [38.05904245087491]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLM's ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z)
Causal LLM Routing: End-to-End Regret Minimization from Observational Data [3.3580884064577616]
LLM routing aims to select the most appropriate model for each query.<n>Prior approaches typically adopt a decoupled strategy, where the metrics are first predicted and the model is then selected based on these estimates.<n>We propose a causal end-to-end framework that learns routing policies by minimizing decision-making regret from observational data.
arXiv Detail & Related papers (2025-05-21T21:34:18Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce HyPER, a Hybrid Preference routER that defers an annotation to either humans or language models (LMs)<n>We show that the selected hybrid mixture of synthetic and direct human preferences using HyPER achieves better RM performance compared to using either one exclusively by 7-13% on RewardBench.<n>We also analyze features from HyPER and find that prompts with moderate safety concerns or complexity benefit the most from human feedback.
arXiv Detail & Related papers (2024-10-24T20:04:15Z)
A Unified Approach to Routing and Cascading for LLMs [5.653106385738822]
Large language models (LLMs) embedded in various agentic systems have increased the potential of model selection strategies to improve the cost-performance tradeoff.<n>Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found.<n>We derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy.<n>We propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy.
arXiv Detail & Related papers (2024-10-14T10:00:49Z)
RouterRetriever: Routing over a Mixture of Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages a mixture of domain-specific experts by using a routing mechanism.<n> RouterRetriever is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z)
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism [1.3624495460189863]
We explore the utilization of small language models operating in a domain-specific manner based on router mechanisms. Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters.
arXiv Detail & Related papers (2024-07-24T17:25:12Z)
Inverse Optimization for Routing Problems [3.282021317933024]
We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO) Our examples and results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems.
arXiv Detail & Related papers (2023-07-14T14:03:47Z)
Learn-n-Route: Learning implicit preferences for vehicle routing [9.434400627011108]
We investigate a learning decision support system for vehicle routing, where the routing engine learns implicit preferences that human planners have when manually creating route plans (or routings) The goal is to use these learned subjective preferences on top of the distance-based objective criterion in vehicle routing systems.
arXiv Detail & Related papers (2021-01-11T14:57:46Z)
AutoRC: Improving BERT Based Relation Classification Models via Architecture Search [50.349407334562045]
BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models. No consensus can be reached on what is the optimal architecture. We design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices.
arXiv Detail & Related papers (2020-09-22T16:55:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.