Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models
- URL: http://arxiv.org/abs/2411.09837v1
- Date: Thu, 14 Nov 2024 23:02:30 GMT
- Title: Real-time Adapting Routing (RAR): Improving Efficiency Through Continuous Learning in Software Powered by Layered Foundation Models
- Authors: Kirill Vasilevski, Dayi Lin, Ahmed Hassan,
- Abstract summary: Existing routing models rely on learning the optimal routing decision from carefully curated data.
We propose Real-time Adaptive Routing (RAR), an approach to continuously adapt FM routing decisions.
RAR routes 50.2% fewer requests to computationally expensive models while maintaining around 90.5% of the general response quality.
- Score: 5.716829002003189
- License:
- Abstract: To balance the quality and inference cost of a Foundation Model (FM, such as large language models (LLMs)) powered software, people often opt to train a routing model that routes requests to FMs with different sizes and capabilities. Existing routing models rely on learning the optimal routing decision from carefully curated data, require complex computations to be updated, and do not consider the potential evolution of weaker FMs. In this paper, we propose Real-time Adaptive Routing (RAR), an approach to continuously adapt FM routing decisions while using guided in-context learning to enhance the capabilities of weaker FM. The goal is to reduce reliance on stronger, more expensive FMs. We evaluate our approach on different subsets of the popular MMLU benchmark. Over time, our approach routes 50.2% fewer requests to computationally expensive models while maintaining around 90.5% of the general response quality. In addition, the guides generated from stronger models have shown intra-domain generalization and led to a better quality of responses compared to an equivalent approach with a standalone weaker FM.
Related papers
- DRL Optimization Trajectory Generation via Wireless Network Intent-Guided Diffusion Models for Optimizing Resource Allocation [58.62766376631344]
We propose a customized wireless network intent (WNI-G) model to address different state variations of wireless communication networks.
Extensive simulation achieves greater stability in spectral efficiency and variations of traditional DRL models in dynamic communication systems.
arXiv Detail & Related papers (2024-10-18T14:04:38Z) - DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation [10.645244994430483]
We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation framework.
We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making.
As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.
arXiv Detail & Related papers (2024-10-15T07:09:56Z) - Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition [7.351361666395708]
We introduce FM-Fi, a cross-modal framework engineered to translate the knowledge of vision-based FMs for enhancing RF-based HAR systems.
FM-Fi involves a novel cross-modal contrastive knowledge distillation mechanism, enabling an RF encoder to inherit the interpretative power of FMs.
It also employs the intrinsic capabilities of FM and RF to remove extraneous features for better alignment between the two modalities.
arXiv Detail & Related papers (2024-10-13T03:43:59Z) - Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning [5.421492821020181]
Federated Learning (FL) is a distributed machine learning approach that enables devices to collaboratively train models without sharing their local data.
Applying FL to real-world data presents challenges, particularly as most existing FL research focuses on unimodal data.
We propose FlexMod, a novel approach to enhance computational efficiency in MFL by adaptively allocating training resources for each modality encoder.
arXiv Detail & Related papers (2024-08-13T01:14:27Z) - RouteLLM: Learning to Route LLMs with Preference Data [41.687640419561504]
Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost.
We propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference.
We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance.
arXiv Detail & Related papers (2024-06-26T18:10:22Z) - Learn From Model Beyond Fine-Tuning: A Survey [78.80920533793595]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.
The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.
This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z) - Decision-Focused Model-based Reinforcement Learning for Reward Transfer [27.899494428456048]
We propose a novel robust decision-focused (RDF) algorithm that learns a transition model that achieves high returns while being robust to changes in the reward function.
We provide theoretical and empirical evidence, on a variety of simulators and real patient data, that RDF can learn simple yet effective models that can be used to plan personalized policies.
arXiv Detail & Related papers (2023-04-06T20:47:09Z) - Automated Federated Learning in Mobile Edge Networks -- Fast Adaptation
and Convergence [83.58839320635956]
Federated Learning (FL) can be used in mobile edge networks to train machine learning models in a distributed manner.
Recent FL has been interpreted within a Model-Agnostic Meta-Learning (MAML) framework, which brings FL significant advantages in fast adaptation and convergence over heterogeneous datasets.
This paper addresses how much benefit MAML brings to FL and how to maximize such benefit over mobile edge networks.
arXiv Detail & Related papers (2023-03-23T02:42:10Z) - Improving and generalizing flow-based generative models with minibatch
optimal transport [90.01613198337833]
We introduce the generalized conditional flow matching (CFM) technique for continuous normalizing flows (CNFs)
CFM features a stable regression objective like that used to train the flow in diffusion models but enjoys the efficient inference of deterministic flow models.
A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference.
arXiv Detail & Related papers (2023-02-01T14:47:17Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.