Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies
- URL: http://arxiv.org/abs/2502.18296v1
- Date: Tue, 25 Feb 2025 15:33:59 GMT
- Title: Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies
- Authors: James C. A. Main, Mickael Randour,
- Abstract summary: We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function.<n>For any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider multi-dimensional payoff functions in Markov decision processes, and ask whether a given expected payoff vector can be achieved or not. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function and its consequences regarding randomisation requirements for strategies. In particular, we prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.
Related papers
- Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms [0.0]
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth.<n>We introduce a novel framework for decision-making in combining strategies, irrespective of market conditions.<n>We show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios.
arXiv Detail & Related papers (2024-06-05T23:08:57Z) - Paths to Equilibrium in Games [6.812247730094933]
We study sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning.
Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
arXiv Detail & Related papers (2024-03-26T19:58:39Z) - Generalized Hybrid Search and Applications to Blockchain and Hash
Function Security [50.16790546184646]
We first examine the hardness of solving various search problems by hybrid quantum-classical strategies.
We then construct a hybrid quantum-classical search algorithm and analyze its success probability.
arXiv Detail & Related papers (2023-11-07T04:59:02Z) - Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits [81.60136088841948]
We propose an algorithm that minimizes the regret over the horizon of time $T$.
The proposed algorithm is applicable to domains such as recommendation systems and transportation.
arXiv Detail & Related papers (2023-01-31T03:49:00Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Provably Efficient Offline Multi-agent Reinforcement Learning via
Strategy-wise Bonus [48.34563955829649]
We propose a strategy-wise concentration principle which builds a confidence interval for the joint strategy.
For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose an efficient algorithm.
All of our algorithms can naturally take a pre-specified strategy class $Pi$ as input and output a strategy close to the best strategy in $Pi$.
arXiv Detail & Related papers (2022-06-01T00:18:15Z) - Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff
Objectives in Countable MDPs [0.0]
We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards.
For each payoff type, the objective is to maximize the probability that the $liminf$ is non-negative.
Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both.
arXiv Detail & Related papers (2022-03-10T19:17:05Z) - Stochastic Strategic Patient Buyers: Revenue maximization using posted
prices [40.698164629357066]
We consider a seller faced with buyers which have the ability to delay their decision.
Each buyer's type is composed of value and patience, and it is sampled i.i.d. from a distribution.
We characterize both the optimal pure strategy of the seller and the buyer's best response strategy to any seller's mixed strategy.
arXiv Detail & Related papers (2022-02-12T21:11:31Z) - Strategy Complexity of Mean Payoff, Total Payoff and Point Payoff
Objectives in Countable MDPs [0.0]
We study countably infinite Markov decision processes (MDPs) with real-valued transition rewards.
For each payoff type, the objective is to maximize the probability that the $liminf$ is non-negative.
Some cases can be won with memoryless deterministic strategies, while others require a step counter, a reward counter, or both.
arXiv Detail & Related papers (2021-07-01T21:13:39Z) - Bandit Linear Optimization for Sequential Decision Making and
Extensive-Form Games [102.23975166536326]
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.
It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history.
In this paper, we give the first algorithm for the bandit linear optimization problem for dilatedDM that offers both (i) linear-time losses and (ii) $O(sqrtT)$ cumulative regret in
arXiv Detail & Related papers (2021-03-08T05:00:13Z) - On the Impossibility of Convergence of Mixed Strategies with No Regret
Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies.
We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.