Related papers: Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

URL: http://arxiv.org/abs/2209.08466v3
Date: Sat, 24 Jun 2023 19:05:46 GMT
Title: Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
Authors: Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine and Ruslan Salakhutdinov
Abstract summary: We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
Score: 142.36200080384145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from high dimensional sensors can be challenging. Prior work has addressed this challenge by learning low-dimensional representation of observations through auxiliary objectives, such as reconstruction or value prediction. However, the alignment between these auxiliary objectives and the RL objective is often unclear. In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. This objective is a lower bound on expected returns. Unlike prior bounds for model-based RL on policy exploration or model guarantees, our bound is directly on the overall RL objective. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods. While sample efficient methods typically are computationally demanding, our method attains the performance of SAC in about 50% less wall-clock time.

Related papers

Evaluating the Paperclip Maximizer: Are RL-Based Language Models More Likely to Pursue Instrumental Goals? [33.11148546999906]
Key concern is textitinstrumental convergence, where an AI system develops unintended intermediate goals that override the ultimate objective and deviate from human-intended goals. This issue is particularly relevant in reinforcement learning (RL)-trained models, which can generate creative but unintended strategies to maximize rewards. We show that RL-driven models exhibit a stronger tendency for instrumental convergence due to their optimization of goal-directed behavior in ways that may misalign with human intentions.
arXiv Detail & Related papers (2025-02-16T16:29:20Z)
Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks. We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements. We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z)
Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL) Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies. VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z)
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return. Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions. The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z)
Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.