Auxiliary Reward Generation with Transition Distance Representation
Learning
- URL: http://arxiv.org/abs/2402.07412v1
- Date: Mon, 12 Feb 2024 05:13:44 GMT
- Title: Auxiliary Reward Generation with Transition Distance Representation
Learning
- Authors: Siyuan Li and Shijie Han and Yingnan Zhao and By Liang and Peng Liu
- Abstract summary: Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems.
The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree.
We propose a novel representation learning approach that can measure the transition distance'' between states.
- Score: 20.150691753213817
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Reinforcement learning (RL) has shown its strength in challenging sequential
decision-making problems. The reward function in RL is crucial to the learning
performance, as it serves as a measure of the task completion degree. In
real-world problems, the rewards are predominantly human-designed, which
requires laborious tuning, and is easily affected by human cognitive biases. To
achieve automatic auxiliary reward generation, we propose a novel
representation learning approach that can measure the ``transition distance''
between states. Building upon these representations, we introduce an auxiliary
reward generation technique for both single-task and skill-chaining scenarios
without the need for human knowledge. The proposed approach is evaluated in a
wide range of manipulation tasks. The experiment results demonstrate the
effectiveness of measuring the transition distance between states and the
induced improvement by auxiliary rewards, which not only promotes better
learning efficiency but also increases convergent stability.
Related papers
- Learning to Assist Humans without Inferring Rewards [65.28156318196397]
We build upon prior work that studies assistance through the lens of empowerment.
An assistive agent aims to maximize the influence of the human's actions.
We prove that these representations estimate a similar notion of empowerment to that studied by prior work.
arXiv Detail & Related papers (2024-11-04T21:31:04Z) - Variational Curriculum Reinforcement Learning for Unsupervised Discovery
of Skills [25.326624139426514]
We propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Vari Curriculum Curriculum (VUVC)
We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum.
We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup.
arXiv Detail & Related papers (2023-10-30T10:34:25Z) - Primitive Skill-based Robot Learning from Human Evaluative Feedback [28.046559859978597]
Reinforcement learning algorithms face challenges when dealing with long-horizon robot manipulation tasks in real-world environments.
We propose a novel framework, SEED, which leverages two approaches: reinforcement learning from human feedback (RLHF) and primitive skill-based reinforcement learning.
Our results show that SEED significantly outperforms state-of-the-art RL algorithms in sample efficiency and safety.
arXiv Detail & Related papers (2023-07-28T20:48:30Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Temporal Abstractions-Augmented Temporally Contrastive Learning: An
Alternative to the Laplacian in RL [140.12803111221206]
In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting.
We propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation.
We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments.
arXiv Detail & Related papers (2022-03-21T22:07:48Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Intrinsically Motivated Self-supervised Learning in Reinforcement
Learning [15.809835721792687]
In vision-based reinforcement learning (RL) tasks, it is prevalent to assign the auxiliary task with a surrogate self-supervised loss.
We present a simple yet effective idea to employ self-supervised loss as an intrinsic reward, called Intrinsically Motivated Self-Supervised learning in Reinforcement learning (IM-SSR)
We show that the self-supervised loss can be robustness as exploration for novel states and improvement from nuisance elimination.
arXiv Detail & Related papers (2021-06-26T08:43:28Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.