Bi-directional Model Cascading with Proxy Confidence
- URL: http://arxiv.org/abs/2504.19391v1
- Date: Sun, 27 Apr 2025 23:48:14 GMT
- Title: Bi-directional Model Cascading with Proxy Confidence
- Authors: David Warren, Mark Dras,
- Abstract summary: We propose a bi-directional approach to deferral that considers the confidence of small and large models in the cascade simultaneously.<n>We use an analysis of hidden states to improve post-invocation confidence of the small model.<n>We then combine this with a tiny proxy model to estimate pre-invocation confidence of the large model.
- Score: 3.1890398692194326
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Model Cascading, recently applied successfully to LLMs, is a simple but powerful technique that improves the efficiency of inference by selectively applying models of varying sizes. Models are used in sequence from smallest to largest, only deferring samples to large, costly models when smaller models are not sufficiently confident. Existing approaches to deferral use only limited small model confidence estimates because of the inaccessibility of the large model, although large model confidence is known to be important. We therefore propose a bi-directional approach to deferral that considers the confidence of small and large models in the cascade simultaneously through the use of a proxy for the large model. This requires a richer representation of model confidence to enable comparative calibration: we use an analysis of hidden states to improve post-invocation confidence of the small model, which in itself improves cascading results over prior approaches. We then combine this with a tiny proxy model to estimate pre-invocation confidence of the large model. We examine the proposed cascading system over challenging, multiple-choice datasets, finding improvements over standard cascading baselines reflected in reductions in deferrals to more costly models.
Related papers
- SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models [63.63254955809224]
We propose a binary router that distinguishes hard examples from easy ones.<n>Our method selectively applies the larger safety guard model to the data that the router considers hard, improving efficiency while maintaining accuracy.<n> Experimental results on multiple benchmark datasets demonstrate that our adaptive model selection significantly enhances the trade-off between computational cost and safety performance.
arXiv Detail & Related papers (2025-02-18T02:51:17Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Model Stock: All we need is just a few fine-tuned models [34.449901046895185]
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance.
We employ significantly fewer models to achieve final weights yet yield superior accuracy.
We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures.
arXiv Detail & Related papers (2024-03-28T15:57:20Z) - Tiny Models are the Computational Saver for Large Models [1.8350044465969415]
This paper introduces TinySaver, an early-exit-like dynamic model compression approach which employs tiny models to substitute large models adaptively.<n>Our evaluation of this approach in ImageNet-1k classification demonstrates its potential to reduce the number of compute operations by up to 90%, with only negligible losses in performance.
arXiv Detail & Related papers (2024-03-26T14:14:30Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Predicting on the Edge: Identifying Where a Larger Model Does Better [61.793778186198864]
We show that large models have the largest improvement on examples where the small model is most uncertain.
We show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage.
arXiv Detail & Related papers (2022-02-15T18:53:14Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.