Related papers: Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation

URL: http://arxiv.org/abs/2109.00525v1
Date: Wed, 1 Sep 2021 12:02:04 GMT
Title: Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge Distillation
Authors: Tiantian Zhang, Xueqian Wang, Bin Liang, Bo Yuan
Abstract summary: We introduce the concept of "context" into single-task reinforcement learning. We develop a novel scheme, termed as Context Division and Knowledge Distillation driven RL. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms.
Score: 8.044847478961882
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance as later training is likely to overwrite and interfer with previously learned policies. In this paper, we introduce the concept of "context" into single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.

Related papers

Vintix: Action Model via In-Context Reinforcement Learning [72.65703565352769]
We present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models.
arXiv Detail & Related papers (2025-01-31T18:57:08Z)
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions [14.778024171498208]
Reinforcement learning algorithms have proven transformative in a range of domains. Much theory of RL has focused on discrete state spaces or worst-case analysis. We propose a solvable high-dimensional model of RL that can capture a variety of learning protocols.
arXiv Detail & Related papers (2023-06-17T18:16:51Z)
CoDeC: Communication-Efficient Decentralized Continual Learning [6.663641564969944]
Training at the edge utilizes continuously evolving data generated at different locations. Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data. We propose CoDeC, a novel communication-efficient decentralized continual learning algorithm.
arXiv Detail & Related papers (2023-03-27T16:52:17Z)
Entropy Regularized Reinforcement Learning with Cascading Networks [9.973226671536041]
Deep RL uses neural networks as function approximators. One of the major difficulties of RL is the absence of i.i.d. data. In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture.
arXiv Detail & Related papers (2022-10-16T10:28:59Z)
Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime. DaCoRL learns a context-conditioned policy using progressive contextualization. DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems. One way to tackle this problem is to prune neural networks leaving only the necessary parameters. We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z)
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks [16.12495409295754]
Next Generation (NextG) networks are expected to support demanding internet tactile applications such as augmented reality and connected autonomous vehicles. Data-driven approaches can improve the ability of the network to adapt to the current operating conditions. Deep RL (DRL) has been shown to achieve good performance even in complex environments.
arXiv Detail & Related papers (2021-12-07T03:13:20Z)
Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach. In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.