Look where you look! Saliency-guided Q-networks for visual RL tasks
- URL: http://arxiv.org/abs/2209.09203v1
- Date: Fri, 16 Sep 2022 08:28:38 GMT
- Title: Look where you look! Saliency-guided Q-networks for visual RL tasks
- Authors: David Bertoin (ISAE-SUPAERO, IMT, ANITI), Adil Zouitine
(ISAE-SUPAERO), Mehdi Zouitine (IMT), Emmanuel Rachelson (ISAE-SUPAERO,
ANITI)
- Abstract summary: Changes in image statistics or distracting background elements are pitfalls that prevent generalization.
Saliency-guided Q-networks (SGQN) is a generic method for visual reinforcement learning compatible with any value function learning method.
SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning policies, despite their outstanding efficiency in
simulated visual control tasks, have shown disappointing ability to generalize
across disturbances in the input training images. Changes in image statistics
or distracting background elements are pitfalls that prevent generalization and
real-world applicability of such control policies. We elaborate on the
intuition that a good visual policy should be able to identify which pixels are
important for its decision, and preserve this identification of important
sources of information across images. This implies that training of a policy
with small generalization gap should focus on such important pixels and ignore
the others. This leads to the introduction of saliency-guided Q-networks
(SGQN), a generic method for visual reinforcement learning, that is compatible
with any value function learning method. SGQN vastly improves the
generalization capability of Soft Actor-Critic agents and outperforms existing
stateof-the-art methods on the Deepmind Control Generalization benchmark,
setting a new reference in terms of training efficiency, generalization gap,
and policy interpretability.
Related papers
- Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs.
We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts.
We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z) - Vision-Language Models Provide Promptable Representations for Reinforcement Learning [67.40524195671479]
We propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied reinforcement learning (RL)
We show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times.
arXiv Detail & Related papers (2024-02-05T00:48:56Z) - Towards Generic Image Manipulation Detection with Weakly-Supervised
Self-Consistency Learning [49.43362803584032]
We propose weakly-supervised image manipulation detection.
Such a setting can leverage more training images and has the potential to adapt quickly to new manipulation techniques.
Two consistency properties are learned: multi-source consistency (MSC) and inter-patch consistency (IPC)
arXiv Detail & Related papers (2023-09-03T19:19:56Z) - SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained
Networks [52.766795949716986]
We present a study of the generalization capabilities of the pre-trained visual representations at the categorical level.
We propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy.
arXiv Detail & Related papers (2023-07-07T13:01:29Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Temporal Disentanglement of Representations for Improved Generalisation
in Reinforcement Learning [7.972204774778987]
In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training.
We introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations.
We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods.
arXiv Detail & Related papers (2022-07-12T11:46:49Z) - Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for
Visual Reinforcement Learning [27.205521177841568]
We propose Task-aware Lipschitz Data Augmentation (TLDA) for visual Reinforcement Learning (RL)
TLDA explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels.
It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.
arXiv Detail & Related papers (2022-02-21T04:22:07Z) - Unlocking Pixels for Reinforcement Learning via Implicit Attention [61.666538764049854]
We make use of new efficient attention algorithms, recently shown to be highly effective for Transformers.
This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches.
In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features.
arXiv Detail & Related papers (2021-02-08T17:00:26Z) - Measuring Visual Generalization in Continuous Control from Pixels [12.598584313005407]
Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents.
We propose a benchmark that tests agents' visual generalization by adding graphical variety to existing continuous control domains.
We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization.
arXiv Detail & Related papers (2020-10-13T23:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.