Goal-Conditioned Q-Learning as Knowledge Distillation
- URL: http://arxiv.org/abs/2208.13298v1
- Date: Sun, 28 Aug 2022 22:01:10 GMT
- Title: Goal-Conditioned Q-Learning as Knowledge Distillation
- Authors: Alexander Levine, Soheil Feizi
- Abstract summary: We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
- Score: 136.79415677706612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications of reinforcement learning can be formalized as
goal-conditioned environments, where, in each episode, there is a "goal" that
affects the rewards obtained during that episode but does not affect the
dynamics. Various techniques have been proposed to improve performance in
goal-conditioned environments, such as automatic curriculum generation and goal
relabeling. In this work, we explore a connection between off-policy
reinforcement learning in goal-conditioned settings and knowledge distillation.
In particular: the current Q-value function and the target Q-value estimate are
both functions of the goal, and we would like to train the Q-value function to
match its target for all goals. We therefore apply Gradient-Based Attention
Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to
the Q-function update. We empirically show that this can improve the
performance of goal-conditioned off-policy reinforcement learning when the
space of goals is high-dimensional. We also show that this technique can be
adapted to allow for efficient learning in the case of multiple simultaneous
sparse goals, where the agent can attain a reward by achieving any one of a
large set of objectives, all specified at test time. Finally, to provide
theoretical support, we give examples of classes of environments where (under
some assumptions) standard off-policy algorithms require at least O(d^2)
observed transitions to learn an optimal policy, while our proposed technique
requires only O(d) transitions, where d is the dimensionality of the goal and
state space.
Related papers
- CQM: Curriculum Reinforcement Learning with a Quantized World Model [30.21954044028645]
We propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process.
Ours suggests uncertainty and temporal distance-aware curriculum goals that converge to the final goals over the automatically composed goal space.
Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
arXiv Detail & Related papers (2023-10-26T11:50:58Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Bilinear value networks [16.479582509493756]
We show that our bilinear decomposition scheme substantially improves data efficiency and has superior transfer to out-of-distribution goals.
Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.
arXiv Detail & Related papers (2022-04-28T17:58:48Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep
Reinforcement Learning [9.014110264448371]
We propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM)
GPIM jointly learns both an abstract-level policy and a goal-conditioned policy.
Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method.
arXiv Detail & Related papers (2021-04-11T16:26:10Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.