Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning
- URL: http://arxiv.org/abs/2505.16833v1
- Date: Thu, 22 May 2025 16:04:17 GMT
- Title: Strategically Linked Decisions in Long-Term Planning and Reinforcement Learning
- Authors: Alihan Hüyük, Finale Doshi-Velez,
- Abstract summary: Long-term planning involves finding strategies that work toward a goal rather than individually optimizing their immediate outcomes.<n>In this paper, we quantify such dependencies between planned actions with strategic link scores.<n>We demonstrate the utility of strategic link scores through three practical applications.
- Score: 33.879584051748346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-term planning, as in reinforcement learning (RL), involves finding strategies: actions that collectively work toward a goal rather than individually optimizing their immediate outcomes. As part of a strategy, some actions are taken at the expense of short-term benefit to enable future actions with even greater returns. These actions are only advantageous if followed up by the actions they facilitate, consequently, they would not have been taken if those follow-ups were not available. In this paper, we quantify such dependencies between planned actions with strategic link scores: the drop in the likelihood of one decision under the constraint that a follow-up decision is no longer available. We demonstrate the utility of strategic link scores through three practical applications: (i) explaining black-box RL agents by identifying strategically linked pairs among decisions they make, (ii) improving the worst-case performance of decision support systems by distinguishing whether recommended actions can be adopted as standalone improvements or whether they are strategically linked hence requiring a commitment to a broader strategy to be effective, and (iii) characterizing the planning processes of non-RL agents purely through interventions aimed at measuring strategic link scores - as an example, we consider a realistic traffic simulator and analyze through road closures the effective planning horizon of the emergent routing behavior of many drivers.
Related papers
- Counterfactual Strategies for Markov Decision Processes [3.42834279186368]
We introduce counterfactual strategies for Markov Decision Processes (MDPs)<n>During MDP execution, a strategy decides which of the enabled actions to execute next.<n>We identify minimal changes to the initial strategy to reduce that probability below the limit.
arXiv Detail & Related papers (2025-05-14T14:07:27Z) - EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning.<n>EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior.<n> Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z) - Game-Of-Goals: Using adversarial games to achieve strategic resilience [2.0902176621159128]
We assume that competitor agents are behaving in a maximally adversarial fashion.<n>We use game tree search methods to select an optimal execution strategy.<n>Our evaluation function is based on the idea that we want to make our execution plans defensible.
arXiv Detail & Related papers (2025-02-16T22:34:59Z) - Indefinite causal order strategy nor adaptive strategy does not improve the estimation of group action [53.64687146666141]
We consider estimation of unknown unitary operation when the set of possible unitary operations is given by a projective unitary representation of a compact group.<n>We show that indefinite causal order strategy nor adaptive strategy does not improve the performance of this estimation when error function satisfies group covariance.
arXiv Detail & Related papers (2025-01-16T06:00:57Z) - Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer [12.252515483035737]
Current recommendation strategies grapple with two significant hurdles.<n>We introduce a future-conditioned strategy for multi-objective controllable recommendations.<n>We present the Multi-Objective Controllable Decision Transformer (MocDT), an offline Reinforcement Learning (RL) model capable of autonomously learning the mapping from multiple objectives to item sequences.
arXiv Detail & Related papers (2025-01-13T11:12:43Z) - Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms [0.0]
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth.<n>We introduce a novel framework for decision-making in combining strategies, irrespective of market conditions.<n>We show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios.
arXiv Detail & Related papers (2024-06-05T23:08:57Z) - LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly.
We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models.
It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - On strategies for risk management and decision making under uncertainty shared across multiple fields [55.2480439325792]
The paper finds more than 110 examples of such strategies and this approach to risk is termed RDOT: Risk-reducing Design and Operations Toolkit.<n>RDOT strategies fall into six broad categories: structural, reactive, formal, adversarial, multi-stage and positive.<n>Overall, RDOT represents an overlooked class of versatile responses to uncertainty.
arXiv Detail & Related papers (2023-09-06T16:14:32Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.