Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
- URL: http://arxiv.org/abs/2110.12985v2
- Date: Tue, 26 Oct 2021 16:11:10 GMT
- Title: Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning
- Authors: Kibeom Kim, Min Whoo Lee, Yoonsung Kim, Je-Hwan Ryu, Minsu Lee,
Byoung-Tak Zhang
- Abstract summary: We propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way.
We then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction.
- Score: 15.33496710690063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning in a multi-target environment without prior knowledge about the
targets requires a large amount of samples and makes generalization difficult.
To solve this problem, it is important to be able to discriminate targets
through semantic understanding. In this paper, we propose goal-aware
cross-entropy (GACE) loss, that can be utilized in a self-supervised way using
auto-labeled goal states alongside reinforcement learning. Based on the loss,
we then devise goal-discriminative attention networks (GDAN) which utilize the
goal-relevant information to focus on the given instruction. We evaluate the
proposed methods on visual navigation and robot arm manipulation tasks with
multi-target environments and show that GDAN outperforms the state-of-the-art
methods in terms of task success ratio, sample efficiency, and generalization.
Additionally, qualitative analyses demonstrate that our proposed method can
help the agent become aware of and focus on the given instruction clearly,
promoting goal-directed behavior.
Related papers
- CQM: Curriculum Reinforcement Learning with a Quantized World Model [30.21954044028645]
We propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process.
Ours suggests uncertainty and temporal distance-aware curriculum goals that converge to the final goals over the automatically composed goal space.
Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
arXiv Detail & Related papers (2023-10-26T11:50:58Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Understanding the origin of information-seeking exploration in
probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour.
One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive'
We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.