Counterfactual Strategies for Markov Decision Processes
- URL: http://arxiv.org/abs/2505.09412v1
- Date: Wed, 14 May 2025 14:07:27 GMT
- Title: Counterfactual Strategies for Markov Decision Processes
- Authors: Paul Kobialka, Lina Gerlach, Francesco Leofante, Erika Ábrahám, Silvia Lizeth Tapia Tarifa, Einar Broch Johnsen,
- Abstract summary: We introduce counterfactual strategies for Markov Decision Processes (MDPs)<n>During MDP execution, a strategy decides which of the enabled actions to execute next.<n>We identify minimal changes to the initial strategy to reduce that probability below the limit.
- Score: 3.42834279186368
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Counterfactuals are widely used in AI to explain how minimal changes to a model's input can lead to a different output. However, established methods for computing counterfactuals typically focus on one-step decision-making, and are not directly applicable to sequential decision-making tasks. This paper fills this gap by introducing counterfactual strategies for Markov Decision Processes (MDPs). During MDP execution, a strategy decides which of the enabled actions (with known probabilistic effects) to execute next. Given an initial strategy that reaches an undesired outcome with a probability above some limit, we identify minimal changes to the initial strategy to reduce that probability below the limit. We encode such counterfactual strategies as solutions to non-linear optimization problems, and further extend our encoding to synthesize diverse counterfactual strategies. We evaluate our approach on four real-world datasets and demonstrate its practical viability in sophisticated sequential decision-making tasks.
Related papers
- PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.<n> Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - SMART: Self-learning Meta-strategy Agent for Reasoning Tasks [44.45037694899524]
We introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to learn and select the most effective strategies for various reasoning tasks.
We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement.
Our experiments demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance.
arXiv Detail & Related papers (2024-10-21T15:55:04Z) - Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets.
We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG)
We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z) - Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z) - On strategies for risk management and decision making under uncertainty shared across multiple fields [55.2480439325792]
The paper finds more than 110 examples of such strategies and this approach to risk is termed RDOT: Risk-reducing Design and Operations Toolkit.<n>RDOT strategies fall into six broad categories: structural, reactive, formal, adversarial, multi-stage and positive.<n>Overall, RDOT represents an overlooked class of versatile responses to uncertainty.
arXiv Detail & Related papers (2023-09-06T16:14:32Z) - Strategy Synthesis in Markov Decision Processes Under Limited Sampling
Access [3.441021278275805]
In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the synthesiss involved.
In this paper, we devise a strategy algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model.
arXiv Detail & Related papers (2023-03-22T16:58:44Z) - Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control
Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods.
For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.