Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition
- URL: http://arxiv.org/abs/2302.01180v2
- Date: Fri, 3 Feb 2023 10:23:48 GMT
- Title: Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition
- Authors: Peter Sunehag, Alexander Sasha Vezhnevets, Edgar Du\'e\~nez-Guzm\'an,
Igor Mordach, Joel Z. Leibo
- Abstract summary: We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
- Score: 63.67574523750839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many environments contain numerous available niches of variable value, each
associated with a different local optimum in the space of behaviors (policy
space). In such situations it is often difficult to design a learning process
capable of evading distraction by poor local optima long enough to stumble upon
the best available niche. In this work we propose a generic reinforcement
learning (RL) algorithm that performs better than baseline deep Q-learning
algorithms in such environments with multiple variably-valued niches. The
algorithm we propose consists of two parts: an agent architecture and a
learning rule. The agent architecture contains multiple sub-policies. The
learning rule is inspired by fitness sharing in evolutionary computation and
applied in reinforcement learning using Value-Decomposition-Networks in a novel
manner for a single-agent's internal population. It can concretely be
understood as adding an extra loss term where one policy's experience is also
used to update all the other policies in a manner that decreases their value
estimates for the visited states. In particular, when one sub-policy visits a
particular state frequently this decreases the value predicted for other
sub-policies for going to that state. Further, we introduce an artificial
chemistry inspired platform where it is easy to create tasks with multiple
rewarding strategies utilizing different resources (i.e. multiple niches). We
show that agents trained this way can escape poor-but-attractive local optima
to instead converge to harder-to-discover higher value strategies in both the
artificial chemistry environments and in simpler illustrative environments.
Related papers
- Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization [5.590976834881065]
We argue that inventory management presents unique opportunities for reliably applying and evaluating deep reinforcement learning (DRL) algorithms.
The first is Hindsight Differentiable Policy Optimization (HDPO), which performs gradient descent to optimize policy performance.
The second technique involves aligning policy (neural) network structures with the structure of the inventory network.
arXiv Detail & Related papers (2023-06-20T02:58:25Z) - Learning a subspace of policies for online adaptation in Reinforcement
Learning [14.7945053644125]
In control systems, the robot on which a policy is learned might differ from the robot on which a policy will run.
There is a need to develop RL methods that generalize well to variations of the training conditions.
In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time.
arXiv Detail & Related papers (2021-10-11T11:43:34Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Recomposing the Reinforcement Learning Building Blocks with
Hypernetworks [19.523737925041278]
We show that a primary network determines the weights of a conditional dynamic network.
This approach improves the gradient approximation and reduces the learning step variance.
We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL)
arXiv Detail & Related papers (2021-06-12T19:43:12Z) - Multi-agent navigation based on deep reinforcement learning and
traditional pathfinding algorithm [0.0]
We develop a new framework for multi-agent collision avoidance problem.
The framework combined traditional pathfinding algorithm and reinforcement learning.
In our approach, the agents learn whether to be navigated or to take simple actions to avoid their partners.
arXiv Detail & Related papers (2020-12-05T08:56:58Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.