Enhanced Scene Specificity with Sparse Dynamic Value Estimation
- URL: http://arxiv.org/abs/2011.12574v1
- Date: Wed, 25 Nov 2020 08:35:16 GMT
- Title: Enhanced Scene Specificity with Sparse Dynamic Value Estimation
- Authors: Jaskirat Singh and Liang Zheng
- Abstract summary: Multi-scene reinforcement learning has become essential for many applications.
One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP)
In this paper, we argue that the error between the true scene-specific value function and the predicted dynamic estimate can be further reduced by progressively enforcing sparse cluster assignments.
- Score: 22.889059874754242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-scene reinforcement learning involves training the RL agent across
multiple scenes / levels from the same task, and has become essential for many
generalization applications. However, the inclusion of multiple scenes leads to
an increase in sample variance for policy gradient computations, often
resulting in suboptimal performance with the direct application of traditional
methods (e.g. PPO, A3C). One strategy for variance reduction is to consider
each scene as a distinct Markov decision process (MDP) and learn a joint value
function dependent on both state (s) and MDP (M). However, this is non-trivial
as the agent is usually unaware of the underlying level at train / test times
in multi-scene RL. Recently, Singh et al. [1] tried to address this by
proposing a dynamic value estimation approach that models the true joint value
function distribution as a Gaussian mixture model (GMM). In this paper, we
argue that the error between the true scene-specific value function and the
predicted dynamic estimate can be further reduced by progressively enforcing
sparse cluster assignments once the agent has explored most of the state space.
The resulting agents not only show significant improvements in the final reward
score across a range of OpenAI ProcGen environments, but also exhibit increased
navigation efficiency while completing a game level.
Related papers
- Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.
In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.
We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise.
This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z) - Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks.
Our method is motivated by the three key factors in detection: localization, scale consistency and recognition.
Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z) - Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - Sparse Attention Guided Dynamic Value Estimation for Single-Task
Multi-Scene Reinforcement Learning [16.910911657616005]
Training deep reinforcement learning agents on environments with multiple levels / scenes from the same task, has become essential for many applications.
We argue that the sample variance for a multi-scene environment is best minimized by treating each scene as a distinct MDP.
We also demonstrate that the true joint value function for a multi-scene environment, follows a multi-modal distribution which is not captured by traditional CNN / LSTM based critic networks.
arXiv Detail & Related papers (2021-02-14T23:30:13Z) - MAGMA: Inference and Prediction with Multi-Task Gaussian Processes [4.368185344922342]
A novel multi-task Gaussian process (GP) framework is proposed, by using a common mean process for sharing information across tasks.
Our overall algorithm is called textscMagma (standing for Multi tAsk Gaussian processes with common MeAn)
arXiv Detail & Related papers (2020-07-21T11:43:54Z) - Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement
Learning [22.889059874754242]
Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications.
We propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes.
arXiv Detail & Related papers (2020-05-25T17:56:08Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.