Related papers: Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2006.10443v3
Date: Sat, 17 Apr 2021 20:00:25 GMT
Title: Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning
Authors: Milos S. Stankovic, Marko Beko, Srdjan S. Stankovic
Abstract summary: We propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints.
Score: 2.7071541526963805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information structure constraints, limiting inter-agent communications to small neighborhoods. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy gradient temporal difference learning algorithms, including eligibility traces with state dependent parameters, and 2) linear stochastic time varying consensus schemes, represented by directed graphs. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. The main contribution of the paper is a convergence analysis based on the general properties of the underlying Feller-Markov processes and the stochastic time varying consensus model. We prove, under general assumptions, that the parameter estimates generated by all the proposed algorithms weakly converge to the corresponding ordinary differential equations (ODE) with precisely defined invariant sets. It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints. The variance reduction effect of the proposed algorithms is demonstrated by formulating and analyzing an asymptotic stochastic differential equation. Specific guidelines for communication network design are provided. The algorithms' superior properties are illustrated by characteristic simulation results.

Related papers

Stability-based Generalization Bounds for Variational Inference [3.146069168382982]
Variational inference (VI) is widely used for approximate inference in Bayesian machine learning. This paper develops stability based generalization bounds for a class of approximate Bayesian algorithms. The new approach complements PAC-Bayes analysis and can provide tighter bounds in some cases.
arXiv Detail & Related papers (2025-02-17T22:40:26Z)
RHiOTS: A Framework for Evaluating Hierarchical Time Series Forecasting Algorithms [0.393259574660092]
RHiOTS is designed to assess the robustness of hierarchical time series forecasting models and algorithms on real-world datasets. RHiOTS incorporates an innovative visualization component, turning complex, multidimensional robustness evaluation results into intuitive, easily interpretable visuals. Our findings show that traditional statistical methods are more robust than state-of-the-art deep learning algorithms, except when the transformation effect is highly disruptive.
arXiv Detail & Related papers (2024-08-06T18:52:15Z)
Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection [25.29451529910051]
We propose the first provable guarantee for algorithm selection based on algorithm features. We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors.
arXiv Detail & Related papers (2024-05-18T17:38:25Z)
Quantized Hierarchical Federated Learning: A Robust Approach to Statistical Heterogeneity [3.8798345704175534]
We present a novel hierarchical federated learning algorithm that incorporates quantization for communication-efficiency. We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate. Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters.
arXiv Detail & Related papers (2024-03-03T15:40:24Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Galerkin Neural Networks: A Framework for Approximating Variational Equations with Error Control [0.0]
We present a new approach to using neural networks to approximate the solutions of variational equations. We use a sequence of finite-dimensional subspaces whose basis functions are realizations of a sequence of neural networks.
arXiv Detail & Related papers (2021-05-28T20:25:40Z)
Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals. We propose a general graph estimator based on a novel structured fusion regularization. We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives [97.16266088683061]
The article rigorously establishes why symplectic discretization schemes are important for momentum-based optimization algorithms. It provides a characterization of algorithms that exhibit accelerated convergence.
arXiv Detail & Related papers (2020-02-28T00:32:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.