Opinion-Guided Reinforcement Learning
- URL: http://arxiv.org/abs/2405.17287v2
- Date: Sat, 3 Aug 2024 17:05:46 GMT
- Title: Opinion-Guided Reinforcement Learning
- Authors: Kyanna Dagenais, Istvan David,
- Abstract summary: We present a method to guide reinforcement learning agents through opinions.
We evaluate it with synthetic (oracle) and human advisors, at different levels of uncertainty.
Our results indicate that opinions, even if uncertain, improve the performance of reinforcement learning agents.
- Score: 0.46040036610482665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human guidance is often desired in reinforcement learning to improve the performance of the learning agent. However, human insights are often mere opinions and educated guesses rather than well-formulated arguments. While opinions are subject to uncertainty, e.g., due to partial informedness or ignorance about a problem, they also emerge earlier than hard evidence can be produced. Thus, guiding reinforcement learning agents by way of opinions offers the potential for more performant learning processes, but comes with the challenge of modeling and managing opinions in a formal way. In this article, we present a method to guide reinforcement learning agents through opinions. To this end, we provide an end-to-end method to model and manage advisors' opinions. To assess the utility of the approach, we evaluate it with synthetic (oracle) and human advisors, at different levels of uncertainty, and under multiple advice strategies. Our results indicate that opinions, even if uncertain, improve the performance of reinforcement learning agents, resulting in higher rewards, more efficient exploration, and a better reinforced policy. Although we demonstrate our approach through a two-dimensional topological running example, our approach is applicable to complex problems with higher dimensions as well.
Related papers
- Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning [59.98430756337374]
Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks.
Our work introduces a novel technique aimed at cultivating a deeper understanding of the training problems at hand.
We propose reflective augmentation, a method that embeds problem reflection into each training instance.
arXiv Detail & Related papers (2024-06-17T19:42:22Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Explainable Action Advising for Multi-Agent Reinforcement Learning [32.49380192781649]
Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm.
We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen.
This allows the student to self-reflect on what it has learned, enabling generalization advice and leading to improved sample efficiency and learning performance.
arXiv Detail & Related papers (2022-11-15T04:15:03Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Improving Human Sequential Decision-Making with Reinforcement Learning [29.334511328067777]
We design a novel machine learning algorithm that is capable of extracting "best practices" from trace data.
Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy.
Experiments show that the tips generated by our algorithm can significantly improve human performance.
arXiv Detail & Related papers (2021-08-19T02:57:58Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Action Advising with Advice Imitation in Deep Reinforcement Learning [0.5185131234265025]
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm.
We present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy.
arXiv Detail & Related papers (2021-04-17T04:24:04Z) - Tracking the Race Between Deep Reinforcement Learning and Imitation
Learning -- Extended Version [0.0]
We consider a benchmark planning problem from the reinforcement learning domain, the Racetrack.
We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model.
arXiv Detail & Related papers (2020-08-03T10:31:44Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.