Investigating the Impact of Action Representations in Policy Gradient
Algorithms
- URL: http://arxiv.org/abs/2309.06921v1
- Date: Wed, 13 Sep 2023 12:41:45 GMT
- Title: Investigating the Impact of Action Representations in Policy Gradient
Algorithms
- Authors: Jan Schneider, Pierre Schumacher, Daniel H\"aufle, Bernhard
Sch\"olkopf, Dieter B\"uchler
- Abstract summary: Reinforcement learning is a versatile framework for learning to solve complex real-world tasks.
influences on the learning performance of RL algorithms are often poorly understood in practice.
- Score: 11.383263522013868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning~(RL) is a versatile framework for learning to solve
complex real-world tasks. However, influences on the learning performance of RL
algorithms are often poorly understood in practice. We discuss different
analysis techniques and assess their effectiveness for investigating the impact
of action representations in RL. Our experiments demonstrate that the action
representation can significantly influence the learning performance on popular
RL benchmark tasks. The analysis results indicate that some of the performance
differences can be attributed to changes in the complexity of the optimization
landscape. Finally, we discuss open challenges of analysis techniques for RL
algorithms.
Related papers
- Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? [1.9116784879310031]
In deep Reinforcement Learning (RL), value functions are approximated using deep neural networks and trained via mean squared error regression objectives.
Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective.
Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup.
arXiv Detail & Related papers (2024-06-10T14:25:11Z) - Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning [74.67655210734338]
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption.
We develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations.
We empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks.
arXiv Detail & Related papers (2023-11-20T23:56:58Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Improved Context-Based Offline Meta-RL with Attention and Contrastive
Learning [1.3106063755117399]
We improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives.
Theoretical analysis and experiments are presented to demonstrate the superior performance, efficiency and robustness of our end-to-end and model free method.
arXiv Detail & Related papers (2021-02-22T05:05:16Z) - How to Make Deep RL Work in Practice [15.740760669623876]
Reported results of state-of-the-art algorithms are often difficult to reproduce.
We make suggestions which of those techniques to use by default and highlight areas that could benefit from a solution specifically tailored to RL.
arXiv Detail & Related papers (2020-10-25T10:37:54Z) - What Matters In On-Policy Reinforcement Learning? A Large-Scale
Empirical Study [50.79125250286453]
On-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
But state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents.
These choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations.
We implement >50 such choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study.
arXiv Detail & Related papers (2020-06-10T17:59:03Z) - Implementation Matters in Deep Policy Gradients: A Case Study on PPO and
TRPO [90.90009491366273]
We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms.
Specifically, we investigate the consequences of "code-level optimizations:"
Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function.
arXiv Detail & Related papers (2020-05-25T16:24:59Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.