Distributed Value Function Approximation for Collaborative Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2006.10443v3
- Date: Sat, 17 Apr 2021 20:00:25 GMT
- Title: Distributed Value Function Approximation for Collaborative Multi-Agent
Reinforcement Learning
- Authors: Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic
- Abstract summary: We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning.
The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations.
It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
- Score: 2.7071541526963805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose several novel distributed gradient-based temporal
difference algorithms for multi-agent off-policy learning of linear
approximation of the value function in Markov decision processes with strict
information structure constraints, limiting inter-agent communications to small
neighborhoods. The algorithms are composed of: 1) local parameter updates based
on single-agent off-policy gradient temporal difference learning algorithms,
including eligibility traces with state dependent parameters, and 2) linear
stochastic time varying consensus schemes, represented by directed graphs. The
proposed algorithms differ by their form, definition of eligibility traces,
selection of time scales and the way of incorporating consensus iterations. The
main contribution of the paper is a convergence analysis based on the general
properties of the underlying Feller-Markov processes and the stochastic time
varying consensus model. We prove, under general assumptions, that the
parameter estimates generated by all the proposed algorithms weakly converge to
the corresponding ordinary differential equations (ODE) with precisely defined
invariant sets. It is demonstrated how the adopted methodology can be applied
to temporal-difference algorithms under weaker information structure
constraints. The variance reduction effect of the proposed algorithms is
demonstrated by formulating and analyzing an asymptotic stochastic differential
equation. Specific guidelines for communication network design are provided.
The algorithms' superior properties are illustrated by characteristic
simulation results.
Related papers
- RHiOTS: A Framework for Evaluating Hierarchical Time Series Forecasting Algorithms [0.393259574660092]
RHiOTS is designed to assess the robustness of hierarchical time series forecasting models and algorithms on real-world datasets.
RHiOTS incorporates an innovative visualization component, turning complex, multidimensional robustness evaluation results into intuitive, easily interpretable visuals.
Our findings show that traditional statistical methods are more robust than state-of-the-art deep learning algorithms, except when the transformation effect is highly disruptive.
arXiv Detail & Related papers (2024-08-06T18:52:15Z) - Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection [25.29451529910051]
We propose the first provable guarantee for algorithm selection based on algorithm features.
We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors.
arXiv Detail & Related papers (2024-05-18T17:38:25Z) - Quantized Hierarchical Federated Learning: A Robust Approach to
Statistical Heterogeneity [3.8798345704175534]
We present a novel hierarchical federated learning algorithm that incorporates quantization for communication-efficiency.
We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate.
Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters.
arXiv Detail & Related papers (2024-03-03T15:40:24Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Galerkin Neural Networks: A Framework for Approximating Variational
Equations with Error Control [0.0]
We present a new approach to using neural networks to approximate the solutions of variational equations.
We use a sequence of finite-dimensional subspaces whose basis functions are realizations of a sequence of neural networks.
arXiv Detail & Related papers (2021-05-28T20:25:40Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic
Perspectives [97.16266088683061]
The article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms.
It provides a characterization of algorithms that exhibit accelerated convergence.
arXiv Detail & Related papers (2020-02-28T00:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.