Graded-Q Reinforcement Learning with Information-Enhanced State Encoder
for Hierarchical Collaborative Multi-Vehicle Pursuit
- URL: http://arxiv.org/abs/2210.13470v1
- Date: Mon, 24 Oct 2022 16:35:34 GMT
- Title: Graded-Q Reinforcement Learning with Information-Enhanced State Encoder
for Hierarchical Collaborative Multi-Vehicle Pursuit
- Authors: Yiying Yang, Xinhang Li, Zheng Yuan, Qinwen Wang, Chen Xu, Lin Zhang
- Abstract summary: The multi-vehicle pursuit (MVP) is becoming a hot research topic in Intelligent Transportation System (ITS)
This paper proposed a graded-Q reinforcement learning with information-enhanced state encoder (GQRL-IESE) framework to address this hierarchical collaborative pursuit problem.
In the GQRL-IESE, a cooperative graded Q scheme is proposed to facilitate the decision-making of pursuing vehicles to improve pursuing efficiency.
- Score: 11.195170949292496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multi-vehicle pursuit (MVP), as a problem abstracted from various
real-world scenarios, is becoming a hot research topic in Intelligent
Transportation System (ITS). The combination of Artificial Intelligence (AI)
and connected vehicles has greatly promoted the research development of MVP.
However, existing works on MVP pay little attention to the importance of
information exchange and cooperation among pursuing vehicles under the complex
urban traffic environment. This paper proposed a graded-Q reinforcement
learning with information-enhanced state encoder (GQRL-IESE) framework to
address this hierarchical collaborative multi-vehicle pursuit (HCMVP) problem.
In the GQRL-IESE, a cooperative graded Q scheme is proposed to facilitate the
decision-making of pursuing vehicles to improve pursuing efficiency. Each
pursuing vehicle further uses a deep Q network (DQN) to make decisions based on
its encoded state. A coordinated Q optimizing network adjusts the individual
decisions based on the current environment traffic information to obtain the
global optimal action set. In addition, an information-enhanced state encoder
is designed to extract critical information from multiple perspectives and uses
the attention mechanism to assist each pursuing vehicle in effectively
determining the target. Extensive experimental results based on SUMO indicate
that the total timestep of the proposed GQRL-IESE is less than other methods on
average by 47.64%, which demonstrates the excellent pursuing efficiency of the
GQRL-IESE. Codes are outsourced in https://github.com/ANT-ITS/GQRL-IESE.
Related papers
- SPformer: A Transformer Based DRL Decision Making Method for Connected Automated Vehicles [9.840325772591024]
We propose a CAV decision-making architecture based on transformer and reinforcement learning algorithms.
A learnable policy token is used as the learning medium of the multi-vehicle joint policy.
Our model can make good use of all the state information of vehicles in traffic scenario.
arXiv Detail & Related papers (2024-09-23T15:16:35Z) - Optimizing Age of Information in Vehicular Edge Computing with Federated Graph Neural Network Multi-Agent Reinforcement Learning [44.17644657738893]
This paper focuses on the Age of Information (AoI) as a key metric for data freshness and explores task offloading issues for vehicles under RSU communication resource constraints.
We propose an innovative distributed federated learning framework combining Graph Neural Networks (GNN), named Federated Graph Neural Network Multi-Agent Reinforcement Learning (FGNN-MADRL) to optimize AoI across the system.
arXiv Detail & Related papers (2024-07-01T15:37:38Z) - Progression Cognition Reinforcement Learning with Prioritized Experience
for Multi-Vehicle Pursuit [19.00359253910912]
This paper proposes a Cognition Reinforcement Learning with Prioritized Experience for MVP in urban traffic scenes.
PEPCRL-MVP uses a prioritization network to assess the transitions in the global experience replay buffer according to the parameters of each MARL agent.
PEPCRL-MVP improves pursuing efficiency by 3.95% over TD3-DMAP and its success rate is 34.78% higher than that of MADDPG.
arXiv Detail & Related papers (2023-06-08T08:10:46Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - Integrated Decision and Control for High-Level Automated Vehicles by
Mixed Policy Gradient and Its Experiment Verification [10.393343763237452]
This paper presents a self-evolving decision-making system based on the Integrated Decision and Control (IDC)
An RL algorithm called constrained mixed policy gradient (CMPG) is proposed to consistently upgrade the driving policy of the IDC.
Experiment results show that boosting by data, the system can achieve better driving ability over model-based methods.
arXiv Detail & Related papers (2022-10-19T14:58:41Z) - Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems.
Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity.
We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z) - AI-aided Traffic Control Scheme for M2M Communications in the Internet
of Vehicles [61.21359293642559]
The dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies.
We consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it.
arXiv Detail & Related papers (2022-03-05T10:54:05Z) - Deep Reinforcement Learning Based Multi-Access Edge Computing Schedule
for Internet of Vehicle [16.619839349229437]
We propose a UAVs-assisted approach to help provide a better wireless network service retaining the maximum Quality of Experience (QoE) of the Internet of Vehicles (IoVs) on the lane.
In the paper, we present a Multi-Agent Graph Convolutional Deep Reinforcement Learning (M-AGCDRL) algorithm which combines local observations of each agent with a low-resolution global map as input to learn a policy for each agent.
arXiv Detail & Related papers (2022-02-15T17:14:58Z) - Vehicular Cooperative Perception Through Action Branching and Federated
Reinforcement Learning [101.64598586454571]
A novel framework is proposed to allow reinforcement learning-based vehicular association, resource block (RB) allocation, and content selection of cooperative perception messages (CPMs)
A federated RL approach is introduced in order to speed up the training process across vehicles.
Results show that federated RL improves the training process, where better policies can be achieved within the same amount of time compared to the non-federated approach.
arXiv Detail & Related papers (2020-12-07T02:09:15Z) - Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep
Reinforcement Learning Approach [88.45509934702913]
We design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed.
We incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS.
By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time.
arXiv Detail & Related papers (2020-02-21T07:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.