Online Resource Allocation: Bandits feedback and Advice on Time-varying
Demands
- URL: http://arxiv.org/abs/2302.04182v2
- Date: Mon, 12 Jun 2023 10:52:47 GMT
- Title: Online Resource Allocation: Bandits feedback and Advice on Time-varying
Demands
- Authors: Lixing Lyu and Wang Chi Cheung
- Abstract summary: We consider a general online resource allocation model with bandit feedback and time-varying demands.
Motivated by the recent Online Algorithms with Advice framework, we explore how online advice can inform policy design.
Our proposed algorithm is shown to have both theoretical performance and promising numerical results compared with other algorithms in literature.
- Score: 12.081877372552606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a general online resource allocation model with bandit feedback
and time-varying demands. While online resource allocation has been well
studied in the literature, most existing works make the strong assumption that
the demand arrival process is stationary. In practical applications, such as
online advertisement and revenue management, however, this process may be
exogenous and non-stationary, like the constantly changing internet traffic.
Motivated by the recent Online Algorithms with Advice framework [Mitazenmacher
and Vassilvitskii, \emph{Commun. ACM} 2022], we explore how online advice can
inform policy design. We establish an impossibility result that any algorithm
perform poorly in terms of regret without any advice in our setting. In
contrast, we design an robust online algorithm that leverages the online
predictions on the total demand volumes. Empowered with online advice, our
proposed algorithm is shown to have both theoretical performance and promising
numerical results compared with other algorithms in literature. We also provide
two explicit examples for the time-varying demand scenarios and derive
corresponding theoretical performance guarantees. Finally, we adapt our model
to a network revenue management problem, and numerically demonstrate that our
algorithm can still performs competitively compared to existing baselines.
Related papers
- PageRank Bandits for Link Prediction [72.61386754332776]
Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion.
This paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially.
We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration.
arXiv Detail & Related papers (2024-11-03T02:39:28Z) - Online Distributional Regression [0.0]
Large-scale streaming data are common in modern machine learning applications.
Many fields, such as supply chain management, weather and meteorology, have pivoted towards using probabilistic forecasts.
We present a methodology for online estimation of regularized, linear distributional models.
arXiv Detail & Related papers (2024-06-26T16:04:49Z) - Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe.
In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z) - Online Optimization for Network Resource Allocation and Comparison with
Reinforcement Learning Techniques [0.6466206145151128]
We tackle in this paper an online network resource allocation problem with job transfers.
We propose a randomized online algorithm based on the exponentially weighted method.
We prove that our algorithm enjoys a sub-linear in time regret, which indicates that the algorithm is adapting and learning from its experiences.
arXiv Detail & Related papers (2023-11-16T17:08:27Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Online Caching with no Regret: Optimistic Learning via Recommendations [15.877673959068458]
We build upon the Follow-the-Regularized-Leader (FTRL) framework to include predictions for the file requests.
We extend the framework to learn and utilize the best request predictor in cases where many are available.
We prove that the proposed optimistic learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions.
arXiv Detail & Related papers (2022-04-20T09:29:47Z) - Single-Leg Revenue Management with Advice [27.21119630468227]
We look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework.
We characterize the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice.
We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management.
arXiv Detail & Related papers (2022-02-18T00:32:11Z) - An Online Learning Approach to Optimizing Time-Varying Costs of AoI [26.661352924641285]
We consider systems that require timely monitoring of sources over a communication network.
For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight.
For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
arXiv Detail & Related papers (2021-05-27T18:10:56Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online
Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions.
The goal is to design algorithms that are both consistent and robust.
We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.