Causal LLM Routing: End-to-End Regret Minimization from Observational Data
- URL: http://arxiv.org/abs/2505.16037v1
- Date: Wed, 21 May 2025 21:34:18 GMT
- Title: Causal LLM Routing: End-to-End Regret Minimization from Observational Data
- Authors: Asterios Tsiourvas, Wei Sun, Georgia Perakis,
- Abstract summary: LLM routing aims to select the most appropriate model for each query.<n>Prior approaches typically adopt a decoupled strategy, where the metrics are first predicted and the model is then selected based on these estimates.<n>We propose a causal end-to-end framework that learns routing policies by minimizing decision-making regret from observational data.
- Score: 3.3580884064577616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM routing aims to select the most appropriate model for each query, balancing competing performance metrics such as accuracy and cost across a pool of language models. Prior approaches typically adopt a decoupled strategy, where the metrics are first predicted and the model is then selected based on these estimates. This setup is prone to compounding errors and often relies on full-feedback data, where each query is evaluated by all candidate models, which is costly to obtain and maintain in practice. In contrast, we learn from observational data, which records only the outcome of the model actually deployed. We propose a causal end-to-end framework that learns routing policies by minimizing decision-making regret from observational data. To enable efficient optimization, we introduce two theoretically grounded surrogate objectives: a classification-based upper bound, and a softmax-weighted regret approximation shown to recover the optimal policy at convergence. We further extend our framework to handle heterogeneous cost preferences via an interval-conditioned architecture. Experiments on public benchmarks show that our method outperforms existing baselines, achieving state-of-the-art performance across different embedding models.
Related papers
- Maximally-Informative Retrieval for State Space Model Generation [59.954191072042526]
We introduce Retrieval In-Context Optimization (RICO) to minimize model uncertainty for a particular query at test-time.<n>Unlike traditional retrieval-augmented generation (RAG), which relies on externals for document retrieval, our approach leverages direct feedback from the model.<n>We show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss.
arXiv Detail & Related papers (2025-06-13T18:08:54Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Exploring Query Efficient Data Generation towards Data-free Model Stealing in Hard Label Setting [38.755154033324374]
Data-free model stealing involves replicating the functionality of a target model into a substitute model without accessing the target model's structure, parameters, or training data.<n>This paper presents a new data-free model stealing approach called Query Efficient Data Generation (textbfQEDG)<n>We introduce two distinct loss functions to ensure the generation of sufficient samples that closely and uniformly align with the target model's decision boundary.
arXiv Detail & Related papers (2024-12-18T03:03:15Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives [17.10165955576643]
Current state-of-the-art empirical techniques offer sub-optimal performance on practical, non-decomposable performance objectives.
We propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models.
We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
arXiv Detail & Related papers (2024-03-27T06:55:23Z) - Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective [21.945745750737952]
We show that a reverse behavior appears when the model class is well-specified and there is sufficient data.<n>We also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret.
arXiv Detail & Related papers (2023-04-13T21:54:53Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.