Challenging Common Assumptions in Convex Reinforcement Learning
- URL: http://arxiv.org/abs/2202.01511v1
- Date: Thu, 3 Feb 2022 10:47:10 GMT
- Title: Challenging Common Assumptions in Convex Reinforcement Learning
- Authors: Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello
Restelli
- Abstract summary: We show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.
We believe shedding light on this issue will lead to better approaches and methodologies for convex RL.
- Score: 34.739021482682176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classic Reinforcement Learning (RL) formulation concerns the maximization
of a scalar reward function. More recently, convex RL has been introduced to
extend the RL formulation to all the objectives that are convex functions of
the state distribution induced by a policy. Notably, convex RL covers several
relevant applications that do not fall into the scalar formulation, including
imitation learning, risk-averse RL, and pure exploration. In classic RL, it is
common to optimize an infinite trials objective, which accounts for the state
distribution instead of the empirical state visitation frequencies, even though
the actual number of trajectories is always finite in practice. This is
theoretically sound since the infinite trials and finite trials objectives can
be proved to coincide and thus lead to the same optimal policy. In this paper,
we show that this hidden assumption does not hold in the convex RL setting. In
particular, we show that erroneously optimizing the infinite trials objective
in place of the actual finite trials one, as it is usually done, can lead to a
significant approximation error. Since the finite trials setting is the default
in both simulated and real-world RL, we believe shedding light on this issue
will lead to better approaches and methodologies for convex RL, impacting
relevant research areas such as imitation learning, risk-averse RL, and pure
exploration among others.
Related papers
- More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - The Effective Horizon Explains Deep RL Performance in Stochastic Environments [21.148001945560075]
Reinforcement learning (RL) theory has largely focused on proving mini complexity sample bounds.
We introduce a new RL algorithm, SQIRL, that iteratively learns a nearoptimal policy by exploring randomly to collect rollouts.
We leverage SQIRL to derive instance-dependent sample complexity bounds for RL that are exponential only in an "effective horizon" look-ahead and on the complexity of the class used for approximation.
arXiv Detail & Related papers (2023-12-13T18:58:56Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for
Last-Iterate Convergence in Constrained MDPs [31.663072540757643]
Reinforcement Learning (RL) has been applied to real-world problems with increasing success.
We introduce Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD)
arXiv Detail & Related papers (2023-02-02T18:05:27Z) - LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement
Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs)
We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z) - RL with KL penalties is better viewed as Bayesian inference [4.473139775790299]
We analyze challenges associated with treating a language model as anReinforcement Learning policy.
We show how avoiding those challenges requires moving beyond the RL paradigm.
arXiv Detail & Related papers (2022-05-23T12:47:13Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Decoupling Exploration and Exploitation in Reinforcement Learning [8.946655323517092]
We propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation.
We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards.
arXiv Detail & Related papers (2021-07-19T15:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.