Multi-Agent congestion cost minimization with linear function
approximation
- URL: http://arxiv.org/abs/2301.10993v1
- Date: Thu, 26 Jan 2023 08:45:44 GMT
- Title: Multi-Agent congestion cost minimization with linear function
approximation
- Authors: Prashant Trivedi, Nandyala Hemachandra
- Abstract summary: This work considers multiple agents traversing a network from a source node to the goal node.
Agents' objective is to find a path to the goal node with minimum overall cost in a decentralized way.
We propose a novel multi-agent congestion cost minimization algorithm.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work considers multiple agents traversing a network from a source node
to the goal node. The cost to an agent for traveling a link has a private as
well as a congestion component. The agent's objective is to find a path to the
goal node with minimum overall cost in a decentralized way. We model this as a
fully decentralized multi-agent reinforcement learning problem and propose a
novel multi-agent congestion cost minimization (MACCM) algorithm. Our MACCM
algorithm uses linear function approximations of transition probabilities and
the global cost function. In the absence of a central controller and to
preserve privacy, agents communicate the cost function parameters to their
neighbors via a time-varying communication network. Moreover, each agent
maintains its estimate of the global state-action value, which is updated via a
multi-agent extended value iteration (MAEVI) sub-routine. We show that our
MACCM algorithm achieves a sub-linear regret. The proof requires the
convergence of cost function parameters, the MAEVI algorithm, and analysis of
the regret bounds induced by the MAEVI triggering condition for each agent. We
implement our algorithm on a two node network with multiple links to validate
it. We first identify the optimal policy, the optimal number of agents going to
the goal node in each period. We observe that the average regret is close to
zero for 2 and 3 agents. The optimal policy captures the trade-off between the
minimum cost of staying at a node and the congestion cost of going to the goal
node. Our work is a generalization of learning the stochastic shortest path
problem.
Related papers
- Communication Efficient Decentralization for Smoothed Online Convex Optimization [9.449153668916098]
We study the multi-agent Smoothed Online Convex Optimization (SOCO) problem, where $N$ agents interact through a communication graph.
In each round, each agent $i$ receives a strongly convex hitting cost function $fi_t$ in an online fashion.
Our results hold even when the communication graph changes arbitrarily and adaptively over time.
arXiv Detail & Related papers (2024-11-13T05:59:04Z) - Cooperative Multi-Agent Constrained Stochastic Linear Bandits [2.099922236065961]
A network of $N$ agents communicate locally to minimize their collective regret while keeping their expected cost under a specified threshold $tau$.
We propose a safe distributed upper confidence bound algorithm, so called textitMA-OPLB, and establish a high probability bound on its $T$-round regret.
We show that our regret bound is of order $ mathcalOleft(fracdtau-c_0fraclog(NT)2sqrtNsqrtTlog (1/|lambda|)
arXiv Detail & Related papers (2024-10-22T19:34:53Z) - Decentralized Monte Carlo Tree Search for Partially Observable
Multi-agent Pathfinding [49.730902939565986]
Multi-Agent Pathfinding problem involves finding a set of conflict-free paths for a group of agents confined to a graph.
In this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally.
We propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks.
arXiv Detail & Related papers (2023-12-26T06:57:22Z) - Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem.
It is first algorithm that tackles row-wise distributed data in sparse linear bandits.
It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z) - Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - Multi-Agent Neural Rewriter for Vehicle Routing with Limited Disclosure
of Costs [65.23158435596518]
Solving the multi-vehicle routing problem as a team Markov game with partially observable costs.
Our multi-agent reinforcement learning approach, the so-called multi-agent Neural Rewriter, builds on the single-agent Neural Rewriter to solve the problem by iteratively rewriting solutions.
arXiv Detail & Related papers (2022-06-13T09:17:40Z) - Decentralized Optimization Over the Stiefel Manifold by an Approximate
Augmented Lagrangian Function [7.768673476492471]
This paper focuses on the decentralized optimization problem over the Stiefel manifold.
Agents can only communicate with their neighbors in a collaborative effort to solve this problem.
In existing methods, multiple rounds of communications are required to guarantee the convergence.
This paper proposes a decentralized algorithm, called DESTINY, which only invokes a single round of communications per iteration.
arXiv Detail & Related papers (2021-12-30T07:19:59Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.