Evolutionary Selective Imitation: Interpretable Agents by Imitation
Learning Without a Demonstrator
- URL: http://arxiv.org/abs/2009.08403v1
- Date: Thu, 17 Sep 2020 16:25:31 GMT
- Title: Evolutionary Selective Imitation: Interpretable Agents by Imitation
Learning Without a Demonstrator
- Authors: Roy Eliya, J. Michael Herrmann
- Abstract summary: We propose a new method for training an agent via an evolutionary strategy (ES)
In every iteration we replace a subset of the samples with samples from the best trajectories discovered so far.
The evaluation procedure for this set is to train, via supervised learning, a randomly initialised neural network (NN) to imitate the set.
- Score: 1.370633147306388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new method for training an agent via an evolutionary strategy
(ES), in which we iteratively improve a set of samples to imitate: Starting
with a random set, in every iteration we replace a subset of the samples with
samples from the best trajectories discovered so far. The evaluation procedure
for this set is to train, via supervised learning, a randomly initialised
neural network (NN) to imitate the set and then execute the acquired policy
against the environment. Our method is thus an ES based on a fitness function
that expresses the effectiveness of imitating an evolving data subset. This is
in contrast to other ES techniques that iterate over the weights of the policy
directly. By observing the samples that the agent selects for learning, it is
possible to interpret and evaluate the evolving strategy of the agent more
explicitly than in NN learning. In our experiments, we trained an agent to
solve the OpenAI Gym environment Bipedalwalker-v3 by imitating an
evolutionarily selected set of only 25 samples with a NN with only a few
thousand parameters. We further test our method on the Procgen game Plunder and
show here as well that the proposed method is an interpretable, small, robust
and effective alternative to other ES or policy gradient methods.
Related papers
- Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy.
We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z) - Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - READ: Improving Relation Extraction from an ADversarial Perspective [33.44949503459933]
We propose an adversarial training method specifically designed for relation extraction (RE)
Our approach introduces both sequence- and token-level perturbations to the sample and uses a separate perturbation vocabulary to improve the search for entity and context perturbations.
arXiv Detail & Related papers (2024-04-02T16:42:44Z) - Fast Propagation is Better: Accelerating Single-Step Adversarial
Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples.
We propose to exploit the interior building blocks of the model to improve efficiency.
Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z) - Continual Test-time Domain Adaptation via Dynamic Sample Selection [38.82346845855512]
This paper proposes a Dynamic Sample Selection (DSS) method for Continual Test-time Domain Adaptation (CTDA)
We apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information.
Our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.
arXiv Detail & Related papers (2023-10-05T06:35:21Z) - Learning Transferable Reward for Query Object Localization with Policy
Adaptation [49.994989590997655]
We learn a transferable reward signal formulated using the exemplary set by ordinal metric learning.
Our proposed method enables test-time policy adaptation to new environments where the reward signals are not readily available.
arXiv Detail & Related papers (2022-02-24T22:52:14Z) - Active Learning for Deep Visual Tracking [51.5063680734122]
Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years.
In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model.
Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost.
arXiv Detail & Related papers (2021-10-17T11:47:56Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Improving speech recognition models with small samples for air traffic
control systems [9.322392779428505]
In this work, a novel training approach based on pretraining and transfer learning is proposed to address the issue of small training samples.
Three real ATC datasets are used to validate the proposed ASR model and training strategies.
The experimental results demonstrate that the ASR performance is significantly improved on all three datasets.
arXiv Detail & Related papers (2021-02-16T08:28:52Z) - Evolutionary Stochastic Policy Distillation [139.54121001226451]
We propose a new method called Evolutionary Policy Distillation (ESPD) to solve GCRS tasks.
ESPD enables a target policy to learn from a series of its variants through the technique of policy distillation (PD)
The experiments based on the MuJoCo control suite show the high learning efficiency of the proposed method.
arXiv Detail & Related papers (2020-04-27T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.