Large-Scale Traffic Signal Control by a Nash Deep Q-network Approach
- URL: http://arxiv.org/abs/2301.00637v1
- Date: Mon, 2 Jan 2023 12:58:51 GMT
- Title: Large-Scale Traffic Signal Control by a Nash Deep Q-network Approach
- Authors: Yuli.Zhang, Shangbo.Wang, Ruiyuan.Jiang
- Abstract summary: We introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches.
One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process.
We show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.
- Score: 7.23135508361981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) is currently one of the most commonly used
techniques for traffic signal control (TSC), which can adaptively adjusted
traffic signal phase and duration according to real-time traffic data. However,
a fully centralized RL approach is beset with difficulties in a multi-network
scenario because of exponential growth in state-action space with increasing
intersections. Multi-agent reinforcement learning (MARL) can overcome the
high-dimension problem by employing the global control of each local RL agent,
but it also brings new challenges, such as the failure of convergence caused by
the non-stationary Markov Decision Process (MDP). In this paper, we introduce
an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the
weakness of both fully centralized and MARL approaches. The OPNDQN algorithm
solves the problem that traditional algorithms cannot be used in large
state-action space traffic models by utilizing a fictitious game approach at
each iteration to find the nash equilibrium among neighboring intersections,
from which no intersection has incentive to unilaterally deviate. One of main
advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov
process because it considers the mutual influence among neighboring
intersections by sharing their actions. On the other hand, for training a large
traffic network, the convergence rate of OPNDQN is higher than that of existing
MARL approaches because it does not incorporate all state information of each
agent. We conduct an extensive experiments by using Simulation of Urban
MObility simulator (SUMO), and show the dominant superiority of OPNDQN over
several existing MARL approaches in terms of average queue length, episode
training reward and average waiting time.
Related papers
- Improving Traffic Flow Predictions with SGCN-LSTM: A Hybrid Model for Spatial and Temporal Dependencies [55.2480439325792]
This paper introduces the Signal-Enhanced Graph Convolutional Network Long Short Term Memory (SGCN-LSTM) model for predicting traffic speeds across road networks.
Experiments on the PEMS-BAY road network traffic dataset demonstrate the SGCN-LSTM model's effectiveness.
arXiv Detail & Related papers (2024-11-01T00:37:00Z) - Scalable spectral representations for network multiagent control [53.631272539560435]
A popular model for multi-agent control, Network Markov Decision Processes (MDPs) pose a significant challenge to efficient learning.
We first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local $Q$-function of each agent.
We design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm.
arXiv Detail & Related papers (2024-10-22T17:45:45Z) - Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting [70.66710698485745]
We propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting.
AHSTN exploits the spatial hierarchy and modeling multi-scale spatial correlations.
Experiments on two real-world datasets show that AHSTN achieves better performance over several strong baselines.
arXiv Detail & Related papers (2023-06-15T14:50:27Z) - A Novel Multi-Agent Deep RL Approach for Traffic Signal Control [13.927155702352131]
We propose a Friend-Deep Q-network (Friend-DQN) approach for multiple traffic signal control in urban networks.
In particular, the cooperation between multiple agents can reduce the state-action space and thus speed up the convergence.
arXiv Detail & Related papers (2023-06-05T08:20:37Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Decentralized Federated Reinforcement Learning for User-Centric Dynamic
TFDD Control [37.54493447920386]
We propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme to meet asymmetric and heterogeneous traffic demands.
We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP)
In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm.
arXiv Detail & Related papers (2022-11-04T07:39:21Z) - A Deep Reinforcement Learning Approach for Traffic Signal Control
Optimization [14.455497228170646]
Inefficient traffic signal control methods may cause numerous problems, such as traffic congestion and waste of energy.
This paper first proposes a multi-agent deep deterministic policy gradient (MADDPG) method by extending the actor-critic policy gradient algorithms.
arXiv Detail & Related papers (2021-07-13T14:11:04Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z) - MetaVIM: Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control [54.162449208797334]
Traffic signal control aims to coordinate traffic signals across intersections to improve the traffic efficiency of a district or a city.
Deep reinforcement learning (RL) has been applied to traffic signal control recently and demonstrated promising performance where each traffic signal is regarded as an agent.
We propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method to learn the decentralized policy for each intersection that considers neighbor information in a latent way.
arXiv Detail & Related papers (2021-01-04T03:06:08Z) - Area-wide traffic signal control based on a deep graph Q-Network (DGQN)
trained in an asynchronous manner [3.655021726150368]
Reinforcement learning (RL) algorithms have been widely applied in traffic signal studies.
There are, however, several problems in jointly controlling traffic lights for a large transportation network.
arXiv Detail & Related papers (2020-08-05T06:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.