Reinforcement Learning with a Terminator
- URL: http://arxiv.org/abs/2205.15376v2
- Date: Thu, 5 Oct 2023 19:02:39 GMT
- Title: Reinforcement Learning with a Terminator
- Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit,
Gal Chechik, Assaf Hallak, and Gal Dalal
- Abstract summary: We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
- Score: 80.34572413850186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the problem of reinforcement learning with exogenous termination.
We define the Termination Markov Decision Process (TerMDP), an extension of the
MDP framework, in which episodes may be interrupted by an external
non-Markovian observer. This formulation accounts for numerous real-world
situations, such as a human interrupting an autonomous driving agent for
reasons of discomfort. We learn the parameters of the TerMDP and leverage the
structure of the estimation problem to provide state-wise confidence bounds. We
use these to construct a provably-efficient algorithm, which accounts for
termination, and bound its regret. Motivated by our theoretical analysis, we
design and implement a scalable approach, which combines optimism (w.r.t.
termination) and a dynamic discount factor, incorporating the termination
probability. We deploy our method on high-dimensional driving and MinAtar
benchmarks. Additionally, we test our approach on human data in a driving
setting. Our results demonstrate fast convergence and significant improvement
over various baseline approaches.
Related papers
- Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Discriminant Distance-Aware Representation on Deterministic Uncertainty
Quantification Methods [2.309984352134254]
We introduce a novel and efficient method for deterministic uncertainty estimation called Discriminant Distance-Awareness Representation (DDAR)
By leveraging a distinction layer over optimal trainable prototypes, DDAR can learn a discriminant distance-awareness representation.
Our experiments show that DDAR is a flexible and architecture-agnostic method that can be easily integrated as a pluggable layer with distance-sensitive metrics.
arXiv Detail & Related papers (2024-02-20T02:26:48Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - On the Foundation of Distributionally Robust Reinforcement Learning [19.621038847810198]
We contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL)
This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary.
Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP)
arXiv Detail & Related papers (2023-11-15T15:02:23Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - End-to-End Policy Gradient Method for POMDPs and Explainable Agents [2.1700203922407493]
We propose an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph.
Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv Detail & Related papers (2023-04-19T15:45:52Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.