Design and Scheduling of an AI-based Queueing System
- URL: http://arxiv.org/abs/2406.06855v1
- Date: Tue, 11 Jun 2024 00:01:42 GMT
- Title: Design and Scheduling of an AI-based Queueing System
- Authors: Jiung Lee, Hongseok Namkoong, Yibo Zeng,
- Abstract summary: We consider a large queueing system where the class of a job is estimated using a prediction model.
By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner.
- Score: 12.763457245603824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.
Related papers
- Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Third-Party Language Model Performance Prediction from Instruction [59.574169249307054]
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks.
A user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate.
We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task.
arXiv Detail & Related papers (2024-03-19T03:53:47Z) - Unleash the Power of Context: Enhancing Large-Scale Recommender Systems
with Context-Based Prediction Models [2.3267858167388775]
A Context-Based Prediction Model determines the probability of a user's action solely by relying on user and contextual features.
We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability.
arXiv Detail & Related papers (2023-07-25T07:57:12Z) - GNN-based Passenger Request Prediction [0.3480973072524161]
This paper develops a Graph Neural Network framework along with the Attention Mechanism to predict the Origin-Destination (OD) flow of passengers.
The proposed framework exploits various linear and non-linear dependencies that arise among requests originating from different locations.
The optimal size of the grid cell that covers the road network preserves the complexity and accuracy of the model.
arXiv Detail & Related papers (2023-01-06T14:04:46Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z) - A Meta-learning based Distribution System Load Forecasting Model
Selection Framework [6.499433762038562]
The framework includes the following processes: feature extraction, candidate model labeling, offline training, and online model recommendation.
Using user load forecasting needs as input features, multiple meta-learners are used to rank the available load forecast models based on their forecasting accuracy.
A scoring-voting mechanism weights recommendations from each meta-leaner to make the final recommendations.
arXiv Detail & Related papers (2020-09-25T01:53:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.