Related papers: Look where you look! Saliency-guided Q-networks for visual RL tasks

Look where you look! Saliency-guided Q-networks for visual RL tasks

URL: http://arxiv.org/abs/2209.09203v1
Date: Fri, 16 Sep 2022 08:28:38 GMT
Title: Look where you look! Saliency-guided Q-networks for visual RL tasks
Authors: David Bertoin (ISAE-SUPAERO, IMT, ANITI), Adil Zouitine (ISAE-SUPAERO), Mehdi Zouitine (IMT), Emmanuel Rachelson (ISAE-SUPAERO, ANITI)
Abstract summary: Changes in image statistics or distracting background elements are pitfalls that prevent generalization. Saliency-guided Q-networks (SGQN) is a generic method for visual reinforcement learning compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.

Related papers

Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [12.9372563969007]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning. In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information. We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z)
DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents. We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents. Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z)
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control [73.6361029556484]
Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. We consider pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts. We show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
arXiv Detail & Related papers (2024-05-09T15:39:54Z)
Vision-Language Models Provide Promptable Representations for Reinforcement Learning [67.40524195671479]
We propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied reinforcement learning (RL) We show that our approach can use chain-of-thought prompting to produce representations of common-sense semantic reasoning, improving policy performance in novel scenes by 1.5 times.
arXiv Detail & Related papers (2024-02-05T00:48:56Z)
Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning [49.43362803584032]
We propose weakly-supervised image manipulation detection. Such a setting can leverage more training images and has the potential to adapt quickly to new manipulation techniques. Two consistency properties are learned: multi-source consistency (MSC) and inter-patch consistency (IPC)
arXiv Detail & Related papers (2023-09-03T19:19:56Z)
SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks [52.766795949716986]
We present a study of the generalization capabilities of the pre-trained visual representations at the categorical level. We propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy.
arXiv Detail & Related papers (2023-07-07T13:01:29Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning [27.205521177841568]
We propose Task-aware Lipschitz Data Augmentation (TLDA) for visual Reinforcement Learning (RL) TLDA explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels. It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.
arXiv Detail & Related papers (2022-02-21T04:22:07Z)
Unlocking Pixels for Reinforcement Learning via Implicit Attention [61.666538764049854]
We make use of new efficient attention algorithms, recently shown to be highly effective for Transformers. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches. In addition, we propose a new efficient algorithm approximating softmax attention with what we call hybrid random features.
arXiv Detail & Related papers (2021-02-08T17:00:26Z)
Measuring Visual Generalization in Continuous Control from Pixels [12.598584313005407]
Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents. We propose a benchmark that tests agents' visual generalization by adding graphical variety to existing continuous control domains. We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization.
arXiv Detail & Related papers (2020-10-13T23:42:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.