Recomposing the Reinforcement Learning Building Blocks with
Hypernetworks
- URL: http://arxiv.org/abs/2106.06842v1
- Date: Sat, 12 Jun 2021 19:43:12 GMT
- Title: Recomposing the Reinforcement Learning Building Blocks with
Hypernetworks
- Authors: Shai Keynan, Elad Sarafian and Sarit Kraus
- Abstract summary: We show that a primary network determines the weights of a conditional dynamic network.
This approach improves the gradient approximation and reduces the learning step variance.
We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL)
- Score: 19.523737925041278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Reinforcement Learning (RL) building blocks, i.e. Q-functions and policy
networks, usually take elements from the cartesian product of two domains as
input. In particular, the input of the Q-function is both the state and the
action, and in multi-task problems (Meta-RL) the policy can take a state and a
context. Standard architectures tend to ignore these variables' underlying
interpretations and simply concatenate their features into a single vector. In
this work, we argue that this choice may lead to poor gradient estimation in
actor-critic algorithms and high variance learning steps in Meta-RL algorithms.
To consider the interaction between the input variables, we suggest using a
Hypernetwork architecture where a primary network determines the weights of a
conditional dynamic network. We show that this approach improves the gradient
approximation and reduces the learning step variance, which both accelerates
learning and improves the final performance. We demonstrate a consistent
improvement across different locomotion tasks and different algorithms both in
RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).
Related papers
- Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.
Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.
We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - Step-size Optimization for Continual Learning [5.834516080130717]
In continual learning, a learner has to keep learning from the data over its whole life time.
In a neural network, this can be implemented by using a step-size vector to scale how much samples change network weights.
Common algorithms, like RMSProp and Adam, use gradients, specifically normalization, to adapt this step-size vector.
arXiv Detail & Related papers (2024-01-30T19:35:43Z) - VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability [0.8192907805418583]
Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision.
Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks.
It is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms.
arXiv Detail & Related papers (2024-01-21T18:00:15Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network.
We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization.
This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network.
We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.