Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
- URL: http://arxiv.org/abs/2209.12013v1
- Date: Sat, 24 Sep 2022 14:02:05 GMT
- Title: Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
- Authors: Raunak Kumar and Robert Kleinberg
- Abstract summary: Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty.
We introduce a natural generalization of the BwK problem that allows non-monotonic resource utilization.
- Score: 6.227074051959886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bandits with knapsacks (BwK) is an influential model of sequential
decision-making under uncertainty that incorporates resource consumption
constraints. In each round, the decision-maker observes an outcome consisting
of a reward and a vector of nonnegative resource consumptions, and the budget
of each resource is decremented by its consumption. In this paper we introduce
a natural generalization of the stochastic BwK problem that allows
non-monotonic resource utilization. In each round, the decision-maker observes
an outcome consisting of a reward and a vector of resource drifts that can be
positive, negative or zero, and the budget of each resource is incremented by
its drift. Our main result is a Markov decision process (MDP) policy that has
constant regret against a linear programming (LP) relaxation when the
decision-maker knows the true outcome distributions. We build upon this to
develop a learning algorithm that has logarithmic regret against the same LP
relaxation when the decision-maker does not know the true outcome
distributions. We also present a reduction from BwK to our model that shows our
regret bound matches existing results.
Related papers
- Fair Resource Allocation in Weakly Coupled Markov Decision Processes [3.824858358548714]
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes.
We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective.
arXiv Detail & Related papers (2024-11-14T20:40:55Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Bandits with Replenishable Knapsacks: the Best of both Worlds [26.786438972889208]
We study a natural generalization of the BwK framework which allows non-monotonic resource utilization.
Under an input model, our algorithm yields an instance-independent $tildeO(T1/2)$ regret bound.
arXiv Detail & Related papers (2023-06-14T12:34:00Z) - Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z) - Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption.
In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources.
The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.