Sequential Fair Resource Allocation under a Markov Decision Process
Framework
- URL: http://arxiv.org/abs/2301.03758v2
- Date: Fri, 16 Jun 2023 18:12:40 GMT
- Title: Sequential Fair Resource Allocation under a Markov Decision Process
Framework
- Authors: Parisa Hassanzadeh, Eleonora Kreacic, Sihan Zeng, Yuchen Xiao, Sumitra
Ganesh
- Abstract summary: We study the sequential decision-making problem of allocating a limited resource to agents that reveal their demands on arrival over a finite horizon.
We propose a new algorithm, SAFFE, that makes fair allocations with respect to the entire demands revealed over the horizon.
We show that SAFFE leads to more fair and efficient allocations and achieves close-to-optimal performance in settings with dense arrivals.
- Score: 9.440900386313213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the sequential decision-making problem of allocating a limited
resource to agents that reveal their stochastic demands on arrival over a
finite horizon. Our goal is to design fair allocation algorithms that exhaust
the available resource budget. This is challenging in sequential settings where
information on future demands is not available at the time of decision-making.
We formulate the problem as a discrete time Markov decision process (MDP). We
propose a new algorithm, SAFFE, that makes fair allocations with respect to the
entire demands revealed over the horizon by accounting for expected future
demands at each arrival time. The algorithm introduces regularization which
enables the prioritization of current revealed demands over future potential
demands depending on the uncertainty in agents' future demands. Using the MDP
formulation, we show that SAFFE optimizes allocations based on an upper bound
on the Nash Social Welfare fairness objective, and we bound its gap to
optimality with the use of concentration bounds on total future demands. Using
synthetic and real data, we compare the performance of SAFFE against existing
approaches and a reinforcement learning policy trained on the MDP. We show that
SAFFE leads to more fair and efficient allocations and achieves
close-to-optimal performance in settings with dense arrivals.
Related papers
- Best Arm Identification for Stochastic Rising Bandits [84.55453174601826]
Rising Bandits (SRBs) model sequential decision-making problems in which the expected reward of the available options increases every time they are selected.
This paper focuses on the fixed-budget Best Arm Identification (BAI) problem for SRBs.
We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE and R-SR.
arXiv Detail & Related papers (2023-02-15T08:01:37Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - Online Resource Allocation under Horizon Uncertainty [27.21119630468227]
In revenue management and online advertising, number of requests can vary widely because of fluctuations in demand or user traffic intensity.
In this work, we develop online algorithms that are robust to horizon uncertainty.
In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large.
arXiv Detail & Related papers (2022-06-27T19:54:10Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Optimal Admission Control for Multiclass Queues with Time-Varying
Arrival Rates via State Abstraction [16.99621896314678]
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks.
The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon.
We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
arXiv Detail & Related papers (2022-03-14T12:38:13Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Model-Free Reinforcement Learning for Optimal Control of MarkovDecision
Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process.
We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption.
In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources.
The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z) - Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance.
We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z) - Reinforcement Learning of Risk-Constrained Policies in Markov Decision
Processes [5.081241420920605]
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty.
We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes.
Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP.
arXiv Detail & Related papers (2020-02-27T13:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.