Related papers: Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

URL: http://arxiv.org/abs/2110.05286v4
Date: Wed, 7 Feb 2024 23:45:53 GMT
Title: Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning
Authors: Yantian Zha, Lin Guan, and Subbarao Kambhampati
Abstract summary: Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. Inspired by how humans handle such situations, we propose to use self-explanation to recognize valuable high-level relational features. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works.
Score: 20.263419567168388
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL-Agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can provide some guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance.

Related papers

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [91.88062410741833]
This study investigates whether similar reasoning capabilities can be successfully integrated into large vision-language models (LVLMs) We consider an approach that iteratively leverages supervised fine-tuning (SFT) on lightweight training data and Reinforcement Learning (RL) to further improve model generalization. OpenVLThinker, a LVLM exhibiting consistently improved reasoning performance on challenging benchmarks such as MathVista, MathVerse, and MathVision, demonstrates the potential of our strategy for robust vision-language reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z)
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs) This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld. We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z)
RLInspect: An Interactive Visual Approach to Assess Reinforcement Learning Algorithm [0.0]
Reinforcement Learning (RL) is a rapidly growing area of machine learning. Assessing RL models can be challenging, which makes it difficult to interpret their behaviour. We have developed RLInspect, an interactive visual analytic tool. It takes into account different components of the RL model - state, action, agent architecture and reward, and provides a more comprehensive view of the RL training.
arXiv Detail & Related papers (2024-11-13T07:24:14Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning [2.0341936392563063]
We propose RACCER, the first-specific approach to generating counterfactual explanations for the behavior of RL agents. We use a tree search to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agents' behavior.
arXiv Detail & Related papers (2023-03-08T09:47:00Z)
D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning [48.57484755946714]
D-Shape is a new method for combining imitation learning (IL) and reinforcement learning (RL) This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict. We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy.
arXiv Detail & Related papers (2022-10-26T02:28:32Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL) We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z)
Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z)
Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback. We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
ACNMP: Skill Transfer and Task Extrapolation through Learning from Demonstration and Reinforcement Learning via Representation Sharing [5.06461227260756]
ACNMPs can be used to implement skill transfer between robots having different morphology. We show the real-world suitability of ACNMPs through real robot experiments.
arXiv Detail & Related papers (2020-03-25T11:28:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.