Inverse Reinforcement Learning with Explicit Policy Estimates
- URL: http://arxiv.org/abs/2103.02863v1
- Date: Thu, 4 Mar 2021 07:00:58 GMT
- Title: Inverse Reinforcement Learning with Explicit Policy Estimates
- Authors: Navyata Sanghvi, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, Kris
Kitani
- Abstract summary: Various methods for solving the inverse reinforcement learning problem have been developed independently in machine learning and economics.
We show that they all belong to a class of optimization problems, characterized by a common form of gradient, the associated policy and the objective.
Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.
- Score: 19.159290496678004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various methods for solving the inverse reinforcement learning (IRL) problem
have been developed independently in machine learning and economics. In
particular, the method of Maximum Causal Entropy IRL is based on the
perspective of entropy maximization, while related advances in the field of
economics instead assume the existence of unobserved action shocks to explain
expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability
method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously
unknown connections between these related methods from both fields. We achieve
this by showing that they all belong to a class of optimization problems,
characterized by a common form of the objective, the associated policy and the
objective gradient. We demonstrate key computational and algorithmic
differences which arise between the methods due to an approximation of the
optimal soft value function, and describe how this leads to more efficient
algorithms. Using insights which emerge from our study of this class of
optimization problems, we identify various problem scenarios and investigate
each method's suitability for these problems.
Related papers
- Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z) - Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective [6.199818486385127]
We use the trial-and-error paradigm of Reinforcement Learning for discovering better decision-making strategies.
This work focuses on non-canonical graph problems for which performant algorithms are typically not known.
arXiv Detail & Related papers (2024-04-09T17:45:25Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems.
We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z) - Accelerating numerical methods by gradient-based meta-solving [15.90188271828615]
In science and engineering applications, it is often required to solve similar computational problems repeatedly.
We propose a gradient-based algorithm to solve them in a unified way.
We demonstrate the performance and versatility of our method through theoretical analysis and numerical experiments.
arXiv Detail & Related papers (2022-06-17T07:31:18Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Stochastic convex optimization for provably efficient apprenticeship
learning [1.0609815608017066]
We consider large-scale Markov decision processes (MDPs) with an unknown cost function.
We employ convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
arXiv Detail & Related papers (2021-12-31T19:47:57Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.