Opportunistic Qualitative Planning in Stochastic Systems with Incomplete
Preferences over Reachability Objectives
- URL: http://arxiv.org/abs/2210.01878v1
- Date: Tue, 4 Oct 2022 19:53:08 GMT
- Title: Opportunistic Qualitative Planning in Stochastic Systems with Incomplete
Preferences over Reachability Objectives
- Authors: Abhishek N. Kulkarni and Jie Fu
- Abstract summary: Preferences play a key role in determining what goals/constraints to satisfy when not all constraints can be satisfied simultaneously.
We present an algorithm to synthesize the SPI and SASI strategies that induce multiple sequential improvements.
- Score: 24.11353445650682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preferences play a key role in determining what goals/constraints to satisfy
when not all constraints can be satisfied simultaneously. In this paper, we
study how to synthesize preference satisfying plans in stochastic systems,
modeled as an MDP, given a (possibly incomplete) combinative preference model
over temporally extended goals. We start by introducing new semantics to
interpret preferences over infinite plays of the stochastic system. Then, we
introduce a new notion of improvement to enable comparison between two prefixes
of an infinite play. Based on this, we define two solution concepts called safe
and positively improving (SPI) and safe and almost-surely improving (SASI) that
enforce improvements with a positive probability and with probability one,
respectively. We construct a model called an improvement MDP, in which the
synthesis of SPI and SASI strategies that guarantee at least one improvement
reduces to computing positive and almost-sure winning strategies in an MDP. We
present an algorithm to synthesize the SPI and SASI strategies that induce
multiple sequential improvements. We demonstrate the proposed approach using a
robot motion planning problem.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Policy Optimization for Stochastic Shortest Path [43.2288319750466]
We study policy optimization for the shortest path (SSP) problem.
We propose a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model.
For most settings, our algorithm is shown to achieve a near-optimal regret bound.
arXiv Detail & Related papers (2022-02-07T16:25:14Z) - Improving Hyperparameter Optimization by Planning Ahead [3.8673630752805432]
We propose a novel transfer learning approach, defined within the context of model-based reinforcement learning.
We propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy.
Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms show that the proposed method can outperform all baselines.
arXiv Detail & Related papers (2021-10-15T11:46:14Z) - PASTO: Strategic Parameter Optimization in Recommendation Systems --
Probabilistic is Better than Deterministic [33.174973495620215]
We show that a probabilistic strategic parameter regime can achieve better value compared to the standard regime of finding a single deterministic parameter.
Our approach is applied in a popular social network platform with hundreds of millions of daily users.
arXiv Detail & Related papers (2021-08-20T09:02:58Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Formal Controller Synthesis for Continuous-Space MDPs via Model-Free
Reinforcement Learning [1.0928470926399565]
A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed.
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs.
We present a novel potential-based reward shaping technique to produce dense rewards to speed up learning.
arXiv Detail & Related papers (2020-03-02T08:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.