CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning
- URL: http://arxiv.org/abs/2210.01805v1
- Date: Mon, 3 Oct 2022 21:16:14 GMT
- Title: CostNet: An End-to-End Framework for Goal-Directed Reinforcement
Learning
- Authors: Per-Arne Andersen and Morten Goodwin and Ole-Christoffer Granmo
- Abstract summary: Reinforcement Learning (RL) is a general framework concerned with an agent that seeks to maximize rewards in an environment.
There are two approaches, model-based and model-free reinforcement learning, that show concrete results in several disciplines.
This paper introduces a novel reinforcement learning algorithm for predicting the distance between two states in a Markov Decision Process.
- Score: 9.432068833600884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) is a general framework concerned with an agent
that seeks to maximize rewards in an environment. The learning typically
happens through trial and error using explorative methods, such as
epsilon-greedy. There are two approaches, model-based and model-free
reinforcement learning, that show concrete results in several disciplines.
Model-based RL learns a model of the environment for learning the policy while
model-free approaches are fully explorative and exploitative without
considering the underlying environment dynamics. Model-free RL works
conceptually well in simulated environments, and empirical evidence suggests
that trial and error lead to a near-optimal behavior with enough training. On
the other hand, model-based RL aims to be sample efficient, and studies show
that it requires far less training in the real environment for learning a good
policy.
A significant challenge with RL is that it relies on a well-defined reward
function to work well for complex environments and such a reward function is
challenging to define. Goal-Directed RL is an alternative method that learns an
intrinsic reward function with emphasis on a few explored trajectories that
reveals the path to the goal state.
This paper introduces a novel reinforcement learning algorithm for predicting
the distance between two states in a Markov Decision Process. The learned
distance function works as an intrinsic reward that fuels the agent's learning.
Using the distance-metric as a reward, we show that the algorithm performs
comparably to model-free RL while having significantly better
sample-efficiently in several test environments.
Related papers
- Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.