Related papers: Online and Scalable Model Selection with Multi-Armed Bandits

Online and Scalable Model Selection with Multi-Armed Bandits

URL: http://arxiv.org/abs/2101.10385v1
Date: Mon, 25 Jan 2021 20:12:52 GMT
Title: Online and Scalable Model Selection with Multi-Armed Bandits
Authors: Jiayi Xie, Michael Tashman, John Hoffman, Lee Winikor, Rouzbeh Gerami
Abstract summary: We present Automatic Model Selector (AMS), a system for scalable online selection of bidding strategies based on real-world performance metrics. AMS allocates the most traffic to the best-performing models while decreasing traffic to those with poorer online performance. In live-traffic tests on multiple ad campaigns, the AMS system proved highly effective at improving ad campaign performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many online applications running on live traffic are powered by machine learning models, for which training, validation, and hyper-parameter tuning are conducted on historical data. However, it is common for models demonstrating strong performance in offline analysis to yield poorer performance when deployed online. This problem is a consequence of the difficulty of training on historical data in non-stationary environments. Moreover, the machine learning metrics used for model selection may not sufficiently correlate with real-world business metrics used to determine the success of the applications being tested. These problems are particularly prominent in the Real-Time Bidding (RTB) domain, in which ML models power bidding strategies, and a change in models will likely affect performance of the advertising campaigns. In this work, we present Automatic Model Selector (AMS), a system for scalable online selection of RTB bidding strategies based on real-world performance metrics. AMS employs Multi-Armed Bandits (MAB) to near-simultaneously run and evaluate multiple models against live traffic, allocating the most traffic to the best-performing models while decreasing traffic to those with poorer online performance, thereby minimizing the impact of inferior models on overall campaign performance. The reliance on offline data is avoided, instead making model selections on a case-by-case basis according to actionable business goals. AMS allows new models to be safely introduced into live campaigns as soon as they are developed, minimizing the risk to overall performance. In live-traffic tests on multiple ad campaigns, the AMS system proved highly effective at improving ad campaign performance.

Related papers

Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments [0.0]
We investigate whether reinforcement learning (RL)-based model management can manage deployment decisions more effectively. Our approach enables more adaptive production environments by continuously evaluating deployed models and rolling back underperforming ones in real-time. Our findings suggest that RL-based model management can improve automation, reduce reliance on manual interventions, and mitigate risks associated with post-deployment model failures.
arXiv Detail & Related papers (2025-03-28T16:42:21Z)
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization [72.81319836138347]
Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. We introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability.
arXiv Detail & Related papers (2024-12-24T05:25:21Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We present a novel immersion-aware model trading framework that incentivizes metaverse users (MUs) to contribute learning models for augmented reality (AR) services in the vehicular metaverse. Considering dynamic network conditions and privacy concerns, we formulate the reward decisions of MSPs as a multi-agent Markov decision process. Experimental results demonstrate that the proposed framework can effectively provide higher-value models for object detection and classification in AR services on real AR-related vehicle datasets.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits [43.65904435249823]
We propose a time-increasing bandit algorithm TI-UCB, which effectively predicts the increase of model performances. Our results highlight the importance of utilizing increasing-then-converging pattern for more efficient and economic model selection.
arXiv Detail & Related papers (2024-03-11T23:52:46Z)
GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning [0.0]
GISTEmbed is a novel strategy that enhances in-batch negative selection during contrastive training through a guide model. Benchmarked against the Massive Text Embedding Benchmark (MTEB), GISTEmbed showcases consistent performance improvements across various model sizes.
arXiv Detail & Related papers (2024-02-26T18:55:15Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Discriminator-Guided Model-Based Offline Imitation Learning [11.856949845359853]
offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. We propose the Discriminator-guided Model-based offline Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.
arXiv Detail & Related papers (2022-07-01T07:28:18Z)
UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning [1.1339580074756188]
Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data. Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set. This motivates the use of model-based offline RL approaches, which leverage planning.
arXiv Detail & Related papers (2021-11-22T10:37:52Z)
Evaluating model-based planning and planner amortization for continuous control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning. We find that well-tuned model-free agents are strong baselines even for high DoF control problems. We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.