Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement
Learning Approach
- URL: http://arxiv.org/abs/2202.12797v2
- Date: Sun, 25 Feb 2024 08:43:57 GMT
- Title: Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement
Learning Approach
- Authors: Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang,
Michael I. Jordan
- Abstract summary: We propose novel learning algorithms to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction.
A key contribution of our approach is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space.
- Score: 130.9259586568977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic mechanism design studies how mechanism designers should allocate
resources among agents in a time-varying environment. We consider the problem
where the agents interact with the mechanism designer according to an unknown
Markov Decision Process (MDP), where agent rewards and the mechanism designer's
state evolve according to an episodic MDP with unknown reward functions and
transition kernels. We focus on the online setting with linear function
approximation and propose novel learning algorithms to recover the dynamic
Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key
contribution of our approach is incorporating reward-free online Reinforcement
Learning (RL) to aid exploration over a rich policy space to estimate prices in
the dynamic VCG mechanism. We show that the regret of our proposed method is
upper bounded by $\tilde{\mathcal{O}}(T^{2/3})$ and further devise a lower
bound to show that our algorithm is efficient, incurring the same
$\tilde{\mathcal{O}}(T^{2 / 3})$ regret as the lower bound, where $T$ is the
total number of rounds. Our work establishes the regret guarantee for online RL
in solving dynamic mechanism design problems without prior knowledge of the
underlying model.
Related papers
- Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning [6.869373893556194]
We study a multi-round mechanism design problem, where we interact with a set of agents over a sequence of rounds.
We wish to design an incentive-compatible (IC) online learning scheme to maximize an application-specific objective.
arXiv Detail & Related papers (2024-07-06T00:02:25Z) - GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems [2.867517731896504]
We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in multi-agent reinforcement learning systems.
We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions.
Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.
arXiv Detail & Related papers (2024-04-01T14:19:00Z) - Refined Mechanism Design for Approximately Structured Priors via Active
Regression [50.71772232237571]
We consider the problem of a revenue-maximizing seller with a large number of items for sale to $n$ strategic bidders.
It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute.
arXiv Detail & Related papers (2023-10-11T20:34:17Z) - On Reward Structures of Markov Decision Processes [4.13365552362244]
A Markov decision process can be parameterized by a transition kernel and a reward function.
We study various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications.
We develop a novel estimator with an instance-specific error bound to $tildeO(sqrtfractau_sn)$ for estimating a single state value.
arXiv Detail & Related papers (2023-08-28T22:29:16Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit
Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values.
We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.
Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.