Related papers: Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

URL: http://arxiv.org/abs/2205.14557v2
Date: Sun, 23 Apr 2023 08:43:38 GMT
Title: Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
Authors: Qiang He, Huangyuan Su, Jieyu Zhang, Xinwen Hou
Abstract summary: In this work, we demonstrate that the learned representation of the $Q$-network and its target $Q$-network should, in theory, satisfy a favorable distinguishable representation property. We propose Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations. PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari.
Score: 9.072416458330268
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (DRL) gives the promise that an agent learns good policy from high-dimensional information, whereas representation learning removes irrelevant and redundant information and retains pertinent information. In this work, we demonstrate that the learned representation of the $Q$-network and its target $Q$-network should, in theory, satisfy a favorable distinguishable representation property. Specifically, there exists an upper bound on the representation similarity of the value functions of two adjacent time steps in a typical DRL setting. However, through illustrative experiments, we show that the learned DRL agent may violate this property and lead to a sub-optimal policy. Therefore, we propose a simple yet effective regularizer called Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations. And we provide the convergence rate guarantee of PEER. Implementing PEER requires only one line of code. Our experiments demonstrate that incorporating PEER into DRL can significantly improve performance and sample efficiency. Comprehensive experiments show that PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of our knowledge, PEER is the first work to study the inherent representation property of Q-network and its target. Our code is available at https://sites.google.com/view/peer-cvpr2023/.

Related papers

Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z)
Contrastive Representation for Interactive Recommendation [20.020630759453237]
We propose Contrastive Representation for Interactive Recommendation (CRIR) CRIR efficiently extracts latent, high-level preference ranking features from explicit interaction. We also propose a data exploiting mechanism and an agent training mechanism to better adapt CRIR to the Deep Reinforcement Learning backbone.
arXiv Detail & Related papers (2024-12-24T12:39:23Z)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems. This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime. As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z)
Learning Bellman Complete Representations for Offline Policy Evaluation [51.96704525783913]
Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL.
arXiv Detail & Related papers (2022-07-12T21:02:02Z)
Provable Benefit of Multitask Representation Learning in Reinforcement Learning [46.11628795660159]
This paper theoretically characterizes the benefit of representation learning under the low-rank Markov decision process (MDP) model. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask reinforcement learning.
arXiv Detail & Related papers (2022-06-13T04:29:02Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
Learning Temporally-Consistent Representations for Data-Efficient Reinforcement Learning [3.308743964406687]
$k$-Step Latent (KSL) is a representation learning method that enforces temporal consistency of representations. KSL produces encoders that generalize better to new tasks unseen during training.
arXiv Detail & Related papers (2021-10-11T00:16:43Z)
Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel. On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Useful Policy Invariant Shaping from Arbitrary Advice [24.59807772487328]
A major challenge of RL research is to discover how to learn with less data. Potential-based reward shaping (PBRS) holds promise, but it is limited by the need for a well-defined potential function. The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent.
arXiv Detail & Related papers (2020-11-02T20:29:09Z)
Learn to Interpret Atari Agents [106.21468537372995]
Region-sensitive Rainbow (RS-Rainbow) is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent. Our proposed agent, named region-sensitive Rainbow (RS-Rainbow), is an end-to-end trainable network based on the original Rainbow, a powerful deep Q-network agent.
arXiv Detail & Related papers (2018-12-29T03:35:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.