Learning Dense Reward with Temporal Variant Self-Supervision
- URL: http://arxiv.org/abs/2205.10431v1
- Date: Fri, 20 May 2022 20:30:57 GMT
- Title: Learning Dense Reward with Temporal Variant Self-Supervision
- Authors: Yuning Wu, Jieliang Luo, Hui Li
- Abstract summary: Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards.
Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations.
This paper proposes a more efficient and robust way of sampling and learning.
- Score: 5.131840233837565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rewards play an essential role in reinforcement learning. In contrast to
rule-based game environments with well-defined reward functions, complex
real-world robotic applications, such as contact-rich manipulation, lack
explicit and informative descriptions that can directly be used as a reward.
Previous effort has shown that it is possible to algorithmically extract dense
rewards directly from multimodal observations. In this paper, we aim to extend
this effort by proposing a more efficient and robust way of sampling and
learning. In particular, our sampling approach utilizes temporal variance to
simulate the fluctuating state and action distribution of a manipulation task.
We then proposed a network architecture for self-supervised learning to better
incorporate temporal information in latent representations. We tested our
approach in two experimental setups, namely joint-assembly and door-opening.
Preliminary results show that our approach is effective and efficient in
learning dense rewards, and the learned rewards lead to faster convergence than
baselines.
Related papers
- Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors [10.454194186065195]
Reinforcement learning has achieved promising results on robotic control tasks but struggles to leverage information effectively.
Recent works construct auxiliary losses based on reconstruction or mutual information to extract joint representations from multiple sensory inputs.
We argue that compressing information in the learned joint representations about raw multimodal observations is helpful.
arXiv Detail & Related papers (2024-10-23T04:32:37Z) - Sharing Knowledge in Multi-Task Deep Reinforcement Learning [57.38874587065694]
We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning.
We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks.
arXiv Detail & Related papers (2024-01-17T19:31:21Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Reward-Predictive Clustering [20.82575016038573]
We provide a clustering algorithm that enables the application of reward-predictive state abstractions to deep learning settings.
A convergence theorem and simulations show that the resulting reward-predictive deep network maximally compresses the agent's inputs.
arXiv Detail & Related papers (2022-11-07T03:13:26Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.