On Sequential Fault-Intolerant Process Planning
- URL: http://arxiv.org/abs/2502.04998v1
- Date: Fri, 07 Feb 2025 15:20:35 GMT
- Title: On Sequential Fault-Intolerant Process Planning
- Authors: Andrzej Kaczmarczyk, Davin Choo, Niclas Boehmer, Milind Tambe, Haifeng Xu,
- Abstract summary: We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP)
SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed.
We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
- Score: 60.66853798340345
- License:
- Abstract: We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm.
Related papers
- Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning [12.608461657195367]
We study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret.
We use a transformer as a decision-making algorithm to learn this shared structure so as to generalize to the test task.
We show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context.
arXiv Detail & Related papers (2024-06-07T16:34:31Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Heuristic Search for Multi-Objective Probabilistic Planning [0.0]
Heuristic search is a powerful approach that has successfully been applied to a broad class of planning problems.
Here, we extend the reach of search to a more class of problems, namely multi-objective shortest paths (MOSSPs)
We design new search algorithms MOLAO* and MOLRTDP, which extend well-known SSP algorithms to the multi-objective problem.
arXiv Detail & Related papers (2023-03-25T05:18:22Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Iterative Depth-First Search for Fully Observable Non-Deterministic
Planning [25.2935633334145]
We develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies.
Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning.
arXiv Detail & Related papers (2022-04-08T23:10:30Z) - Oracle-Efficient Regret Minimization in Factored MDPs with Unknown
Structure [57.90236104782219]
We study regret in non-episodic factored Markov decision processes (FMDPs)
All existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance.
We provide the first algorithm that learns the structure of the FMDP while minimizing the regret.
arXiv Detail & Related papers (2020-09-13T12:30:35Z) - A Unifying Framework for Reinforcement Learning and Planning [2.564530030795554]
This paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP)
At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions.
arXiv Detail & Related papers (2020-06-26T14:30:41Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.