Settling the Communication Complexity for Distributed Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2202.04862v1
- Date: Thu, 10 Feb 2022 06:27:07 GMT
- Title: Settling the Communication Complexity for Distributed Offline
Reinforcement Learning
- Authors: Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang
- Abstract summary: We study a novel setting in offline reinforcement learning (RL) where a number of distributed machines jointly cooperate to solve the problem.
There is a budget constraint on the total number of information (in terms of bits) that each machine can send out.
For value function prediction in contextual bandits, and both episodic and non-episodic MDPs, we establish information-theoretic lower bounds on the minimax risk.
- Score: 10.315054389907031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a novel setting in offline reinforcement learning (RL) where a
number of distributed machines jointly cooperate to solve the problem but only
one single round of communication is allowed and there is a budget constraint
on the total number of information (in terms of bits) that each machine can
send out. For value function prediction in contextual bandits, and both
episodic and non-episodic MDPs, we establish information-theoretic lower bounds
on the minimax risk for distributed statistical estimators; this reveals the
minimum amount of communication required by any offline RL algorithms.
Specifically, for contextual bandits, we show that the number of bits must
scale at least as $\Omega(AC)$ to match the centralised minimax optimal rate,
where $A$ is the number of actions and $C$ is the context dimension; meanwhile,
we reach similar results in the MDP settings. Furthermore, we develop learning
algorithms based on least-squares estimates and Monte-Carlo return estimates
and provide a sharp analysis showing that they can achieve optimal risk up to
logarithmic factors. Additionally, we also show that temporal difference is
unable to efficiently utilise information from all available devices under the
single-round communication setting due to the initial bias of this method. To
our best knowledge, this paper presents the first minimax lower bounds for
distributed offline RL problems.
Related papers
- Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property [0.358439716487063]
The explosion of large-scale data has outstripped the processing capabilities of single-machine systems.
Traditional approaches to distributed inference often struggle with achieving true sparsity in high-dimensional datasets.
We propose a novel, two-stage, distributed best subset selection algorithm to address these issues.
arXiv Detail & Related papers (2024-08-30T13:22:08Z) - Provably Efficient CVaR RL in Low-rank MDPs [58.58570425202862]
We study risk-sensitive Reinforcement Learning (RL)
We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to balance interplay between exploration, exploitation, and representation learning in CVaR RL.
We prove that our algorithm achieves a sample complexity of $epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations.
arXiv Detail & Related papers (2023-11-20T17:44:40Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - The Benefits of Being Distributional: Small-Loss Bounds for
Reinforcement Learning [43.9624940128166]
This paper explains the benefits of distributional reinforcement learning (DistRL) through the lens of small-loss bounds.
In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation.
In offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.
arXiv Detail & Related papers (2023-05-25T04:19:43Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Distributed Contextual Linear Bandits with Minimax Optimal Communication
Cost [48.288452411283444]
We study distributed contextual linear bandits with contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features.
We propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server.
We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $tildemathcalO(dN)$ and its regret is at most $tildemathcalO
arXiv Detail & Related papers (2022-05-26T05:56:23Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Sparse Normal Means Estimation with Sublinear Communication [13.257075075199781]
We consider the problem of normal means estimation in a distributed setting with communication constraints.
We show that once the signal-to-noise ratio (SNR) is slightly higher, the support of $mu$ can be correctly recovered with much less communication.
arXiv Detail & Related papers (2021-02-05T08:52:25Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.