Bidding Games on Markov Decision Processes with Quantitative Reachability Objectives
- URL: http://arxiv.org/abs/2412.19609v1
- Date: Fri, 27 Dec 2024 12:10:00 GMT
- Title: Bidding Games on Markov Decision Processes with Quantitative Reachability Objectives
- Authors: Guy Avni, Martin Kurečka, Kaushik Mallik, Petr Novotný, Suman Sadhukhan,
- Abstract summary: We study a new family of graph games which combine environmental uncertainties and auction-based interactions among the agents.<n>We devise value-it algorithms that approximate thresholds and optimal policies for general MDPs.<n>We show that finding thresholds is at least as hard as solving simple-stochastic games.
- Score: 3.4486432774139355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph games are fundamental in strategic reasoning of multi-agent systems and their environments. We study a new family of graph games which combine stochastic environmental uncertainties and auction-based interactions among the agents, formalized as bidding games on (finite) Markov decision processes (MDP). Normally, on MDPs, a single decision-maker chooses a sequence of actions, producing a probability distribution over infinite paths. In bidding games on MDPs, two players -- called the reachability and safety players -- bid for the privilege of choosing the next action at each step. The reachability player's goal is to maximize the probability of reaching a target vertex, whereas the safety player's goal is to minimize it. These games generalize traditional bidding games on graphs, and the existing analysis techniques do not extend. For instance, the central property of traditional bidding games is the existence of a threshold budget, which is a necessary and sufficient budget to guarantee winning for the reachability player. For MDPs, the threshold becomes a relation between the budgets and probabilities of reaching the target. We devise value-iteration algorithms that approximate thresholds and optimal policies for general MDPs, and compute the exact solutions for acyclic MDPs, and show that finding thresholds is at least as hard as solving simple-stochastic games.
Related papers
- Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences [7.18523391773903]
Multi-objective path finding (MO-MAPF) algorithms produce conflict-free plans.<n>We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm textitLexicographic Conflict-Based Search (LCBS)<n>LCBS integrates a priority-aware low-level $A*$ search with conflict-based search.<n>We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods.
arXiv Detail & Related papers (2025-10-08T17:40:41Z) - Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games [40.05960121330012]
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment.
We propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters.
arXiv Detail & Related papers (2025-02-13T21:28:51Z) - Solving Robust Markov Decision Processes: Generic, Reliable, Efficient [3.789219860006095]
Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities.<n>We provide a framework for solving RMDPs that is generic, reliable, and efficient.<n>Our prototype implementation outperforms existing tools by several orders of magnitude.
arXiv Detail & Related papers (2024-12-13T14:55:48Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously.
We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z) - Cooperation or Competition: Avoiding Player Domination for Multi-Target
Robustness via Adaptive Budgets [76.20705291443208]
We view adversarial attacks as a bargaining game in which different players negotiate to reach an agreement on a joint direction of parameter updating.
We design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance.
Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.
arXiv Detail & Related papers (2023-06-27T14:02:10Z) - Multi-Target Multiplicity: Flexibility and Fairness in Target
Specification under Resource Constraints [76.84999501420938]
We introduce a conceptual and computational framework for assessing how the choice of target affects individuals' outcomes.
We show that the level of multiplicity that stems from target variable choice can be greater than that stemming from nearly-optimal models of a single target.
arXiv Detail & Related papers (2023-06-23T18:57:14Z) - Efficiently Computing Nash Equilibria in Adversarial Team Markov Games [19.717850955051837]
We introduce a class of games in which a team identically players is competing against an adversarial player.
This setting allows for a unifying treatment of zero-sum Markov games potential games.
Our main contribution is the first algorithm for computing stationary $epsilon$-approximate Nash equilibria in adversarial team Markov games.
arXiv Detail & Related papers (2022-08-03T16:41:01Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Improving Bidding and Playing Strategies in the Trick-Taking game Wizard
using Deep Q-Networks [0.0]
The trick-taking game Wizard with a separate bidding and playing phase is modeled by two interleaved partially observable Markov decision processes (POMDP)
Deep Q-Networks (DQN) are used to empower self-improving agents, which are capable of tackling the challenges of a highly non-stationary environment.
The trained DQN agents achieve accuracies between 66% and 87% in self-play, leaving behind both a random baseline and a rule-based asymmetry.
arXiv Detail & Related papers (2022-05-27T08:59:42Z) - Gradient play in stochastic games: stationary points, convergence, and
sample complexity [6.97785632069611]
We study the performance of the gradient play algorithm for games (SGs)
We show that Nash equilibria (NEs) and first-order stationary policies are equivalent in this setting.
For a subclass of SGs called Markov potential games, we design a sample-based reinforcement learning algorithm.
arXiv Detail & Related papers (2021-06-01T03:03:45Z) - Equilibria for Games with Combined Qualitative and Quantitative
Objectives [15.590197778287616]
We study concurrent games in which each player is a process that is assumed to act independently and strategically.
Our main result is that deciding the existence of a strict epsilon Nash equilibrium in such games is 2ExpTime-complete.
arXiv Detail & Related papers (2020-08-13T01:56:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.