Federated Offline Reinforcement Learning: Collaborative Single-Policy
Coverage Suffices
- URL: http://arxiv.org/abs/2402.05876v1
- Date: Thu, 8 Feb 2024 18:09:17 GMT
- Title: Federated Offline Reinforcement Learning: Collaborative Single-Policy
Coverage Suffices
- Authors: Jiin Woo, Laixi Shi, Gauri Joshi, Yuejie Chi
- Abstract summary: offline reinforcement learning (RL) seeks to learn an optimal policy using offline data.
This work explores the benefit of federated learning for offline RL, aiming at collaboratively leveraging offline datasets at multiple agents.
FedLCB-Q is a variant of the popular model-free Q-learning algorithm tailored for federated offline RL.
- Score: 44.97418712091146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL), which seeks to learn an optimal policy
using offline data, has garnered significant interest due to its potential in
critical applications where online data collection is infeasible or expensive.
This work explores the benefit of federated learning for offline RL, aiming at
collaboratively leveraging offline datasets at multiple agents. Focusing on
finite-horizon episodic tabular Markov decision processes (MDPs), we design
FedLCB-Q, a variant of the popular model-free Q-learning algorithm tailored for
federated offline RL. FedLCB-Q updates local Q-functions at agents with novel
learning rate schedules and aggregates them at a central server using
importance averaging and a carefully designed pessimistic penalty term. Our
sample complexity analysis reveals that, with appropriately chosen parameters
and synchronization schedules, FedLCB-Q achieves linear speedup in terms of the
number of agents without requiring high-quality datasets at individual agents,
as long as the local datasets collectively cover the state-action space visited
by the optimal policy, highlighting the power of collaboration in the federated
setting. In fact, the sample complexity almost matches that of the single-agent
counterpart, as if all the data are stored at a central location, up to
polynomial factors of the horizon length. Furthermore, FedLCB-Q is
communication-efficient, where the number of communication rounds is only
linear with respect to the horizon length up to logarithmic factors.
Related papers
- Federated Q-Learning: Linear Regret Speedup with Low Communication Cost [4.380110270510058]
We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein.
We show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large.
Those results rely on an event-triggered synchronization mechanism between the agents and the server.
arXiv Detail & Related papers (2023-12-22T19:14:09Z) - The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup
and Beyond [44.43850105124659]
We consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone.
We provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning.
We propose a novel federated Q-learning algorithm with importance averaging, giving larger weights to more frequently visited state-action pairs.
arXiv Detail & Related papers (2023-05-18T04:18:59Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
and Online RL [48.552287941528]
Off-policy reinforcement learning holds the promise of sample-efficient learning of decision-making policies.
In the offline RL setting, standard off-policy RL methods can significantly underperform.
We introduce Expected-Max Q-Learning (EMaQ), which is more closely related to the resulting practical algorithm.
arXiv Detail & Related papers (2020-07-21T21:13:02Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.