Improving Behavioural Cloning with Human-Driven Dynamic Dataset
Augmentation
- URL: http://arxiv.org/abs/2201.07719v1
- Date: Wed, 19 Jan 2022 16:57:17 GMT
- Title: Improving Behavioural Cloning with Human-Driven Dynamic Dataset
Augmentation
- Authors: Federico Malato, Joona Jehkonen, Ville Hautam\"aki
- Abstract summary: We show how combining behavioural cloning with human-in-the-loop training solves some of its flaws.
We introduce a novel approach that allows an expert to take control of the agent at any moment during a simulation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Behavioural cloning has been extensively used to train agents and is
recognized as a fast and solid approach to teach general behaviours based on
expert trajectories. Such method follows the supervised learning paradigm and
it strongly depends on the distribution of the data. In our paper, we show how
combining behavioural cloning with human-in-the-loop training solves some of
its flaws and provides an agent task-specific corrections to overcome tricky
situations while speeding up the training time and lowering the required
resources. To do this, we introduce a novel approach that allows an expert to
take control of the agent at any moment during a simulation and provide optimal
solutions to its problematic situations. Our experiments show that this
approach leads to better policies both in terms of quantitative evaluation and
in human-likeliness.
Related papers
- Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling [10.931466852026663]
We investigate the optimal use of trained deep reinforcement learning (DRL) agents during inference.
Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget.
We propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent.
arXiv Detail & Related papers (2024-06-11T14:59:18Z) - Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement [1.1510009152620668]
Current methods for constructive neural optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning.
We bridge the two by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning.
We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data.
arXiv Detail & Related papers (2024-03-22T13:09:10Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Robust Reinforcement Learning via Genetic Curriculum [5.421464476555662]
Genetic curriculum is an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum.
Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail.
arXiv Detail & Related papers (2022-02-17T01:14:20Z) - Learning Complex Spatial Behaviours in ABM: An Experimental
Observational Study [0.0]
This paper explores how Reinforcement Learning can be applied to create emergent agent behaviours.
Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy optimisation algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours.
arXiv Detail & Related papers (2022-01-04T11:56:11Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adversarial Imitation Learning with Trajectorial Augmentation and
Correction [61.924411952657756]
We introduce a novel augmentation method which preserves the success of the augmented trajectories.
We develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts.
Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation.
arXiv Detail & Related papers (2021-03-25T14:49:32Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.