Related papers: Goal-Conditioned Generators of Deep Policies

Goal-Conditioned Generators of Deep Policies

URL: http://arxiv.org/abs/2207.01570v1
Date: Mon, 4 Jul 2022 16:41:48 GMT
Title: Goal-Conditioned Generators of Deep Policies
Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, J\"urgen Schmidhuber
Abstract summary: We study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices. Our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training.
Score: 14.946533606788758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

Related papers

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning [2.5352713493505785]
Reinforcement learning -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
arXiv Detail & Related papers (2025-04-02T08:15:16Z)
Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner. Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z)
Upside Down Reinforcement Learning with Policy Generators [26.883212329754848]
Upside Down Reinforcement Learning (UDRL) is a promising framework for solving reinforcement learning problems. We extend UDRL to the task of learning a command-conditioned generator of deep neural network policies. Our method, dubbed Upside Down Reinforcement Learning with Policy Generators (UDRLPG), streamlines comparable techniques.
arXiv Detail & Related papers (2025-01-27T18:25:04Z)
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation [65.01527698201956]
Non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps. We propose AdaNAT, a learnable approach that automatically configures a suitable policy tailored for every sample to be generated.
arXiv Detail & Related papers (2024-08-31T03:53:57Z)
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Direct Random Search for Fine Tuning of Deep Reinforcement Learning Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts. Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
Randomized Policy Learning for Continuous State and Action MDPs [8.109579454896128]
We present textttRANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. We show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
arXiv Detail & Related papers (2020-06-08T02:49:47Z)
PFPN: Continuous Control of Physically Simulated Characters using Particle Filtering Policy Network [0.9137554315375919]
We propose a framework that considers a particle-based action policy as a substitute for Gaussian policies. We demonstrate the applicability of our approach on various motion capture imitation tasks.
arXiv Detail & Related papers (2020-03-16T00:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.