Towards Generalized Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2402.07246v1
- Date: Sun, 11 Feb 2024 17:10:31 GMT
- Title: Towards Generalized Inverse Reinforcement Learning
- Authors: Chaosheng Dong, Yijia Wang
- Abstract summary: This paper studies the problem of learning the basic components of an MDP given observed behavior (policy) that might not be optimal.
We address two key challenges in GIRL: first, the need to quantify the discrepancy between the observed policy and the underlying optimal policy; second, the difficulty of mathematically characterizing the underlying optimal policy.
- Score: 11.880139160394958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies generalized inverse reinforcement learning (GIRL) in
Markov decision processes (MDPs), that is, the problem of learning the basic
components of an MDP given observed behavior (policy) that might not be
optimal. These components include not only the reward function and transition
probability matrices, but also the action space and state space that are not
exactly known but are known to belong to given uncertainty sets. We address two
key challenges in GIRL: first, the need to quantify the discrepancy between the
observed policy and the underlying optimal policy; second, the difficulty of
mathematically characterizing the underlying optimal policy when the basic
components of an MDP are unobservable or partially observable. Then, we propose
the mathematical formulation for GIRL and develop a fast heuristic algorithm.
Numerical results on both finite and infinite state problems show the merit of
our formulation and algorithm.
Related papers
- Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
We present a general framework for applying learning algorithms to the verification of Markov decision processes (MDPs)
The presented framework focuses on probabilistic reachability, which is a core problem in verification.
arXiv Detail & Related papers (2024-03-14T08:54:19Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - Learning in Observable POMDPs, without Computationally Intractable
Oracles [23.636033995089587]
We develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions.
Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations.
arXiv Detail & Related papers (2022-06-07T17:05:27Z) - Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs [24.256960622176305]
We propose the first (nearly) matching upper and lower bounds on the sample complexity of PAC RL in episodic Markov decision processes.
Our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap.
Their design and analyses employ novel ideas, including graph-theoretical concepts such as minimum flows and maximum cuts.
arXiv Detail & Related papers (2022-03-17T11:19:41Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds
for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes.
We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Robust Finite-State Controllers for Uncertain POMDPs [25.377873201375515]
Uncertain partially observable decision processes (uPOMDPs) allow the probabilistic transition observation functions of standard POMDPs to belong to an uncertainty set.
We develop an algorithm to compute finite-memory policies for uPOMDPs.
arXiv Detail & Related papers (2020-09-24T02:58:50Z) - Towards Minimax Optimal Reinforcement Learning in Factored Markov
Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs)
First one achieves minimax optimal regret guarantees for a rich class of factored structures.
Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.