Related papers: Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

URL: http://arxiv.org/abs/2202.12797v3
Date: Wed, 18 Dec 2024 09:54:22 GMT
Title: Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach
Authors: Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan,
Abstract summary: We propose novel learning algorithms to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction.<n>A key contribution of our approach is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space.
Score: 123.55983746427572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment. We consider the problem where the agents interact with the mechanism designer according to an unknown Markov Decision Process (MDP), where agent rewards and the mechanism designer's state evolve according to an episodic MDP with unknown reward functions and transition kernels. We focus on the online setting with linear function approximation and propose novel learning algorithms to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key contribution of our approach is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space to estimate prices in the dynamic VCG mechanism. We show that the regret of our proposed method is upper bounded by $\tilde{\mathcal{O}}(T^{2/3})$ and further devise a lower bound to show that our algorithm is efficient, incurring the same $\Omega(T^{2 / 3})$ regret as the lower bound, where $T$ is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model.

Related papers

Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning [6.869373893556194]
We study a multi-round mechanism design problem, where we interact with a set of agents over a sequence of rounds. We wish to design an incentive-compatible (IC) online learning scheme to maximize an application-specific objective.
arXiv Detail & Related papers (2024-07-06T00:02:25Z)
GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems [2.867517731896504]
We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in multi-agent reinforcement learning systems. We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions. Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.
arXiv Detail & Related papers (2024-04-01T14:19:00Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Refined Mechanism Design for Approximately Structured Priors via Active Regression [50.71772232237571]
We consider the problem of a revenue-maximizing seller with a large number of items for sale to $n$ strategic bidders. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute.
arXiv Detail & Related papers (2023-10-11T20:34:17Z)
On Reward Structures of Markov Decision Processes [4.13365552362244]
A Markov decision process can be parameterized by a transition kernel and a reward function. We study various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications. We develop a novel estimator with an instance-specific error bound to $tildeO(sqrtfractau_sn)$ for estimating a single state value.
arXiv Detail & Related papers (2023-08-28T22:29:16Z)
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values. We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism. Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.