Related papers: Towards Generalized Inverse Reinforcement Learning

Towards Generalized Inverse Reinforcement Learning

URL: http://arxiv.org/abs/2402.07246v1
Date: Sun, 11 Feb 2024 17:10:31 GMT
Title: Towards Generalized Inverse Reinforcement Learning
Authors: Chaosheng Dong, Yijia Wang
Abstract summary: This paper studies the problem of learning the basic components of an MDP given observed behavior (policy) that might not be optimal. We address two key challenges in GIRL: first, the need to quantify the discrepancy between the observed policy and the underlying optimal policy; second, the difficulty of mathematically characterizing the underlying optimal policy.
Score: 11.880139160394958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper studies generalized inverse reinforcement learning (GIRL) in Markov decision processes (MDPs), that is, the problem of learning the basic components of an MDP given observed behavior (policy) that might not be optimal. These components include not only the reward function and transition probability matrices, but also the action space and state space that are not exactly known but are known to belong to given uncertainty sets. We address two key challenges in GIRL: first, the need to quantify the discrepancy between the observed policy and the underlying optimal policy; second, the difficulty of mathematically characterizing the underlying optimal policy when the basic components of an MDP are unobservable or partially observable. Then, we propose the mathematical formulation for GIRL and develop a fast heuristic algorithm. Numerical results on both finite and infinite state problems show the merit of our formulation and algorithm.

Related papers

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z)
Revelations: A Decidable Class of POMDPs with Omega-Regular Objectives [5.696882834567791]
Partially observable Markov decision processes (POMDPs) form a prominent model for uncertainty in sequential decision making. We introduce a revelation mechanism which restricts information loss by requiring that almost surely the agent has eventually full information of the current state. This yields a conceptually simple and exact algorithm for a large class of POMDPs.
arXiv Detail & Related papers (2024-12-16T18:37:06Z)
Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
We present a general framework for applying learning algorithms to the verification of Markov decision processes (MDPs) The presented framework focuses on probabilistic reachability, which is a core problem in verification.
arXiv Detail & Related papers (2024-03-14T08:54:19Z)
Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making. We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z)
Learning in Observable POMDPs, without Computationally Intractable Oracles [23.636033995089587]
We develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations.
arXiv Detail & Related papers (2022-06-07T17:05:27Z)
Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs [24.256960622176305]
We propose the first (nearly) matching upper and lower bounds on the sample complexity of PAC RL in episodic Markov decision processes. Our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. Their design and analyses employ novel ideas, including graph-theoretical concepts such as minimum flows and maximum cuts.
arXiv Detail & Related papers (2022-03-17T11:19:41Z)
Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z)
Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning [50.44564503645015]
We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs.
arXiv Detail & Related papers (2021-07-02T20:36:05Z)
Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z)
Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP) The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z)
Robust Finite-State Controllers for Uncertain POMDPs [25.377873201375515]
Uncertain partially observable decision processes (uPOMDPs) allow the probabilistic transition observation functions of standard POMDPs to belong to an uncertainty set. We develop an algorithm to compute finite-memory policies for uPOMDPs.
arXiv Detail & Related papers (2020-09-24T02:58:50Z)
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes [53.72166325215299]
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs) First one achieves minimax optimal regret guarantees for a rich class of factored structures. Second one enjoys better computational complexity with a slightly worse regret.
arXiv Detail & Related papers (2020-06-24T00:50:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.