Mixture of Experts in a Mixture of RL settings
- URL: http://arxiv.org/abs/2406.18420v1
- Date: Wed, 26 Jun 2024 15:15:15 GMT
- Title: Mixture of Experts in a Mixture of RL settings
- Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro,
- Abstract summary: We show that MoEs can boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons.
We shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training.
- Score: 15.124698782503248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.
Related papers
- FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time.
Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks.
MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z) - Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales [13.818149654692863]
Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance.
In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss.
arXiv Detail & Related papers (2024-05-27T19:28:33Z) - Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning [26.393644289860084]
Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification.
We propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge.
arXiv Detail & Related papers (2024-04-24T07:47:55Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Learning to Optimize for Reinforcement Learning [58.01132862590378]
Reinforcement learning (RL) is essentially different from supervised learning, and in practice, these learneds do not work well even in simple RL tasks.
Agent-gradient distribution is non-independent and identically distributed, leading to inefficient meta-training.
We show that, although only trained in toy tasks, our learned can generalize unseen complex tasks in Brax.
arXiv Detail & Related papers (2023-02-03T00:11:02Z) - The State of Sparse Training in Deep Reinforcement Learning [23.034856834801346]
The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.
Their appeal stems largely from the reduced number of parameters required to train and store, as well as an increase in learning efficiency.
We perform a systematic investigation into applying a number of existing sparse training techniques on a variety of Deep Reinforcement Learning agents and environments.
arXiv Detail & Related papers (2022-06-17T14:08:00Z) - Uniform State Abstraction For Reinforcement Learning [6.624726878647541]
MultiGrid Reinforcement Learning (MRL) has shown that abstract knowledge in the form of a potential function can be learned almost solely from agent interaction with the environment.
In this paper we extend and improve MRL to take advantage of modern Deep Learning algorithms such as Deep Q-Networks (DQN)
arXiv Detail & Related papers (2020-04-06T18:13:08Z) - Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches.
We develop and implement a novel objective for decision making, which we term the free energy of the expected future.
We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.