Reinforcement Learning with a Terminator
- URL: http://arxiv.org/abs/2205.15376v2
- Date: Thu, 5 Oct 2023 19:02:39 GMT
- Title: Reinforcement Learning with a Terminator
- Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit,
Gal Chechik, Assaf Hallak, and Gal Dalal
- Abstract summary: We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
- Score: 80.34572413850186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the problem of reinforcement learning with exogenous termination.
We define the Termination Markov Decision Process (TerMDP), an extension of the
MDP framework, in which episodes may be interrupted by an external
non-Markovian observer. This formulation accounts for numerous real-world
situations, such as a human interrupting an autonomous driving agent for
reasons of discomfort. We learn the parameters of the TerMDP and leverage the
structure of the estimation problem to provide state-wise confidence bounds. We
use these to construct a provably-efficient algorithm, which accounts for
termination, and bound its regret. Motivated by our theoretical analysis, we
design and implement a scalable approach, which combines optimism (w.r.t.
termination) and a dynamic discount factor, incorporating the termination
probability. We deploy our method on high-dimensional driving and MinAtar
benchmarks. Additionally, we test our approach on human data in a driving
setting. Our results demonstrate fast convergence and significant improvement
over various baseline approaches.
Related papers
- Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM.
We propose a reasoning-decision alignment constraint between the paired CoTs and planning results.
We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z) - Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Discriminant Distance-Aware Representation on Deterministic Uncertainty
Quantification Methods [2.309984352134254]
We introduce a novel and efficient method for deterministic uncertainty estimation called Discriminant Distance-Awareness Representation (DDAR)
By leveraging a distinction layer over optimal trainable prototypes, DDAR can learn a discriminant distance-awareness representation.
Our experiments show that DDAR is a flexible and architecture-agnostic method that can be easily integrated as a pluggable layer with distance-sensitive metrics.
arXiv Detail & Related papers (2024-02-20T02:26:48Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - End-to-End Policy Gradient Method for POMDPs and Explainable Agents [2.1700203922407493]
We propose an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph.
Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv Detail & Related papers (2023-04-19T15:45:52Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.