Goal-Conditioned Generators of Deep Policies
- URL: http://arxiv.org/abs/2207.01570v1
- Date: Mon, 4 Jul 2022 16:41:48 GMT
- Title: Goal-Conditioned Generators of Deep Policies
- Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch,
J\"urgen Schmidhuber
- Abstract summary: We study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices.
Our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies.
Experiments show how a single learned policy generator can produce policies that achieve any return seen during training.
- Score: 14.946533606788758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal
policies, given goals encoded in special command inputs. Here we study
goal-conditioned neural nets (NNs) that learn to generate deep NN policies in
form of context-specific weight matrices, similar to Fast Weight Programmers
and other methods from the 1990s. Using context commands of the form "generate
a policy that achieves a desired expected return," our NN generators combine
powerful exploration of parameter space with generalization across commands to
iteratively find better and better policies. A form of weight-sharing
HyperNetworks and policy embeddings scales our method to generate deep NNs.
Experiments show how a single learned policy generator can produce policies
that achieve any return seen during training. Finally, we evaluate our
algorithm on a set of continuous control tasks where it exhibits competitive
performance. Our code is public.
Related papers
- Upside Down Reinforcement Learning with Policy Generators [26.883212329754848]
Upside Down Reinforcement Learning (UDRL) is a promising framework for solving reinforcement learning problems.
We extend UDRL to the task of learning a command-conditioned generator of deep neural network policies.
Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques.
arXiv Detail & Related papers (2025-01-27T18:25:04Z) - Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning [27.868175900131313]
Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state.
This paper postulates multi-linear mappings to efficiently estimate the parameters of the RL policy.
We leverage the PARAFAC decomposition to design tensor low-rank policies.
arXiv Detail & Related papers (2025-01-08T23:22:08Z) - AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation [65.01527698201956]
Non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps.
We propose AdaNAT, a learnable approach that automatically configures a suitable policy tailored for every sample to be generated.
arXiv Detail & Related papers (2024-08-31T03:53:57Z) - Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - PFPN: Continuous Control of Physically Simulated Characters using
Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies.
We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.