Catastrophic Interference in Reinforcement Learning: A Solution Based on
Context Division and Knowledge Distillation
- URL: http://arxiv.org/abs/2109.00525v1
- Date: Wed, 1 Sep 2021 12:02:04 GMT
- Title: Catastrophic Interference in Reinforcement Learning: A Solution Based on
Context Division and Knowledge Distillation
- Authors: Tiantian Zhang, Xueqian Wang, Bin Liang, Bo Yuan
- Abstract summary: We introduce the concept of "context" into single-task reinforcement learning.
We develop a novel scheme, termed as Context Division and Knowledge Distillation driven RL.
Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms.
- Score: 8.044847478961882
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The powerful learning ability of deep neural networks enables reinforcement
learning (RL) agents to learn competent control policies directly from
high-dimensional and continuous environments. In theory, to achieve stable
performance, neural networks assume i.i.d. inputs, which unfortunately does no
hold in the general RL paradigm where the training data is temporally
correlated and non-stationary. This issue may lead to the phenomenon of
"catastrophic interference" and the collapse in performance as later training
is likely to overwrite and interfer with previously learned policies. In this
paper, we introduce the concept of "context" into single-task RL and develop a
novel scheme, termed as Context Division and Knowledge Distillation (CDaKD)
driven RL, to divide all states experienced during training into a series of
contexts. Its motivation is to mitigate the challenge of aforementioned
catastrophic interference in deep RL, thereby improving the stability and
plasticity of RL models. At the heart of CDaKD is a value function,
parameterized by a neural network feature extractor shared across all contexts,
and a set of output heads, each specializing on an individual context. In
CDaKD, we exploit online clustering to achieve context division, and
interference is further alleviated by a knowledge distillation regularization
term on the output layers for learned contexts. In addition, to effectively
obtain the context division in high-dimensional state spaces (e.g., image
inputs), we perform clustering in the lower-dimensional representation space of
a randomly initialized convolutional encoder, which is fixed throughout
training. Our results show that, with various replay memory capacities, CDaKD
can consistently improve the performance of existing RL algorithms on classic
OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring
only moderate computational overhead.
Related papers
- The RL Perceptron: Generalisation Dynamics of Policy Learning in High
Dimensions [14.778024171498208]
Reinforcement learning algorithms have proven transformative in a range of domains.
Much theory of RL has focused on discrete state spaces or worst-case analysis.
We propose a solvable high-dimensional model of RL that can capture a variety of learning protocols.
arXiv Detail & Related papers (2023-06-17T18:16:51Z) - CoDeC: Communication-Efficient Decentralized Continual Learning [6.663641564969944]
Training at the edge utilizes continuously evolving data generated at different locations.
Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data.
We propose CoDeC, a novel communication-efficient decentralized continual learning algorithm.
arXiv Detail & Related papers (2023-03-27T16:52:17Z) - Entropy Regularized Reinforcement Learning with Cascading Networks [9.973226671536041]
Deep RL uses neural networks as function approximators.
One of the major difficulties of RL is the absence of i.i.d. data.
In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture.
arXiv Detail & Related papers (2022-10-16T10:28:59Z) - Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime.
DaCoRL learns a context-conditioned policy using progressive contextualization.
DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems.
One way to tackle this problem is to prune neural networks leaving only the necessary parameters.
We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z) - Federated Deep Reinforcement Learning for the Distributed Control of
NextG Wireless Networks [16.12495409295754]
Next Generation (NextG) networks are expected to support demanding internet tactile applications such as augmented reality and connected autonomous vehicles.
Data-driven approaches can improve the ability of the network to adapt to the current operating conditions.
Deep RL (DRL) has been shown to achieve good performance even in complex environments.
arXiv Detail & Related papers (2021-12-07T03:13:20Z) - Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless
Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach.
In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z) - Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput.
Until today, no such learning-based algorithms have shown practical potential in this domain.
We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks.
We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Transient Non-Stationarity and Generalisation in Deep Reinforcement
Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.