Sparse Attention Guided Dynamic Value Estimation for Single-Task
Multi-Scene Reinforcement Learning
- URL: http://arxiv.org/abs/2102.07266v1
- Date: Sun, 14 Feb 2021 23:30:13 GMT
- Title: Sparse Attention Guided Dynamic Value Estimation for Single-Task
Multi-Scene Reinforcement Learning
- Authors: Jaskirat Singh, Liang Zheng
- Abstract summary: Training deep reinforcement learning agents on environments with multiple levels / scenes from the same task, has become essential for many applications.
We argue that the sample variance for a multi-scene environment is best minimized by treating each scene as a distinct MDP.
We also demonstrate that the true joint value function for a multi-scene environment, follows a multi-modal distribution which is not captured by traditional CNN / LSTM based critic networks.
- Score: 16.910911657616005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training deep reinforcement learning agents on environments with multiple
levels / scenes from the same task, has become essential for many applications
aiming to achieve generalization and domain transfer from simulation to the
real world. While such a strategy is helpful with generalization, the use of
multiple scenes significantly increases the variance of samples collected for
policy gradient computations. Current methods, effectively continue to view
this collection of scenes as a single Markov decision process (MDP), and thus
learn a scene-generic value function V(s). However, we argue that the sample
variance for a multi-scene environment is best minimized by treating each scene
as a distinct MDP, and then learning a joint value function V(s,M) dependent on
both state s and MDP M. We further demonstrate that the true joint value
function for a multi-scene environment, follows a multi-modal distribution
which is not captured by traditional CNN / LSTM based critic networks. To this
end, we propose a dynamic value estimation (DVE) technique, which approximates
the true joint value function through a sparse attention mechanism over
multiple value function hypothesis / modes. The resulting agent not only shows
significant improvements in the final reward score across a range of OpenAI
ProcGen environments, but also exhibits enhanced navigation efficiency and
provides an implicit mechanism for unsupervised state-space skill
decomposition.
Related papers
- Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Enhanced Scene Specificity with Sparse Dynamic Value Estimation [22.889059874754242]
Multi-scene reinforcement learning has become essential for many applications.
One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP)
In this paper, we argue that the error between the true scene-specific value function and the predicted dynamic estimate can be further reduced by progressively enforcing sparse cluster assignments.
arXiv Detail & Related papers (2020-11-25T08:35:16Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z) - Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement
Learning [22.889059874754242]
Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications.
We propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes.
arXiv Detail & Related papers (2020-05-25T17:56:08Z) - Multi-Modal Domain Adaptation for Fine-Grained Action Recognition [35.22906271819216]
We exploit the correspondence of modalities as a self-supervised alignment approach for UDA.
We show that multi-modal self-supervision alone improves the performance over source-only training by 2.4% on average.
We then combine adversarial training with multi-modal self-supervision, showing that our approach outperforms other UDA methods by 3%.
arXiv Detail & Related papers (2020-01-27T11:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.