Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration
- URL: http://arxiv.org/abs/2410.19450v1
- Date: Fri, 25 Oct 2024 10:24:19 GMT
- Title: Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration
- Authors: Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang,
- Abstract summary: We propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE)
First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training.
Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration.
- Score: 40.346958259814514
- License:
- Abstract: Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training, ensuring smoother transitions, and enabling efficient fine-tuning. Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration, thereby significantly reducing the joint state-action space to be explored. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) demonstrate that OVMSE significantly outperforms existing baselines, achieving superior sample efficiency and overall performance.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty
and Smoothness [11.903893267037061]
offline-to-online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions.
Most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation.
We propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness.
arXiv Detail & Related papers (2023-09-29T04:42:50Z) - Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local
Value Regularization [23.416448404647305]
OMIGA is a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization.
We show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
arXiv Detail & Related papers (2023-07-21T14:37:54Z) - A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning [25.123237633748193]
offline-to-online reinforcement learning can be challenging due to constrained exploratory behavior and state-action distribution shift.
We propose a Simple Unified uNcertainty-Guided (SUNG) framework, which unifies the solution to both challenges with the tool of uncertainty.
SUNG achieves state-of-the-art online finetuning performance when combined with different offline RL methods.
arXiv Detail & Related papers (2023-06-13T05:22:26Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
Model Conquers All StarCraftII Tasks [43.588686040547486]
offline pre-training with online fine-tuning has never been studied, nor datasets or benchmarks for offline MARL research are available.
We propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning.
When evaluated on StarCraft II offline dataset, MADT demonstrates superior performance than state-of-the-art offline RL baselines.
arXiv Detail & Related papers (2021-12-06T08:11:05Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data.
We show that it achieves no-regret under conditions analogous to GP-UCB.
Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.