Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving
- URL: http://arxiv.org/abs/2511.16916v1
- Date: Fri, 21 Nov 2025 02:58:04 GMT
- Title: Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving
- Authors: Ye Han, Lijun Zhang, Dejian Meng, Zhuang Zhang,
- Abstract summary: In multi-vehicle cooperative driving tasks, traditional state-based reward functions suffer from vanishing reward differences.<n>This paper proposes a novel Hybrid Differential Reward mechanism to solve this problem.<n>It guides agents to learn high-quality cooperative policies that effectively traffic efficiency and safety.
- Score: 15.387374116985605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-vehicle cooperative driving tasks involving high-frequency continuous control, traditional state-based reward functions suffer from the issue of vanishing reward differences. This phenomenon results in a low signal-to-noise ratio (SNR) for policy gradients, significantly hindering algorithm convergence and performance improvement. To address this challenge, this paper proposes a novel Hybrid Differential Reward (HDR) mechanism. We first theoretically elucidate how the temporal quasi-steady nature of traffic states and the physical proximity of actions lead to the failure of traditional reward signals. Building on this analysis, the HDR framework innovatively integrates two complementary components: (1) a Temporal Difference Reward (TRD) based on a global potential function, which utilizes the evolutionary trend of potential energy to ensure optimal policy invariance and consistency with long-term objectives; and (2) an Action Gradient Reward (ARG), which directly measures the marginal utility of actions to provide a local guidance signal with a high SNR. Furthermore, we formulate the cooperative driving problem as a Multi-Agent Partially Observable Markov Game (POMDPG) with a time-varying agent set and provide a complete instantiation scheme for HDR within this framework. Extensive experiments conducted using both online planning (MCTS) and Multi-Agent Reinforcement Learning (QMIX, MAPPO, MADDPG) algorithms demonstrate that the HDR mechanism significantly improves convergence speed and policy stability. The results confirm that HDR guides agents to learn high-quality cooperative policies that effectively balance traffic efficiency and safety.
Related papers
- Spatio-temporal dual-stage hypergraph MARL for human-centric multimodal corridor traffic signal control [5.728450793445691]
This paper proposes STDSH-MARL (Spatio-Temporal Dual-Stage Hypergraph based Multi-Agent Reinforcement Learning)<n>The proposed method captures dependencies through a novel dual-stage hypergraph attention mechanism that models interactions across both spatial temporal hyperedges.<n> Experiments conducted on a corridor network under five traffic scenarios demonstrate that STDSH-MARL consistently improves multimodal performance and provides clear benefits for public transportation priority.
arXiv Detail & Related papers (2026-02-19T04:18:50Z) - Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - RIS-Assisted Downlink Pinching-Antenna Systems: GNN-Enabled Optimization Approaches [51.56300276709421]
This paper investigates a reconfigurable intelligent surface (RIS)-assisted multi-waveguide pinching-antenna (PA) system (PASS) for multi-user downlink information transmission.<n>By leveraging a graph-structured topology of the RIS-assisted PASS, a novel three-stage graph neural network (GNN) is proposed, which learns PA positions based on user locations.
arXiv Detail & Related papers (2025-11-25T13:43:44Z) - STAR-RIS-assisted Collaborative Beamforming for Low-altitude Wireless Networks [58.13757830013997]
Wireless networks based on uncrewed aerial vehicles (UAVs) offer high mobility, flexibility, and coverage for urban communications.<n>They face severe signal attenuation in dense environments due to obstructions.<n>To address this critical issue, we consider introducing collaborative beam of UAVs and omni-directional re-altitude beamforming.
arXiv Detail & Related papers (2025-10-25T01:28:37Z) - HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking [80.07224739976911]
Event cameras offer exceptional temporal resolution and a range (modal)<n> RGB cameras excel at capturing rich texture with high resolution, whereas event cameras offer exceptional temporal resolution and a range (modal)
arXiv Detail & Related papers (2025-10-22T13:15:13Z) - Cooperative Target Detection with AUVs: A Dual-Timescale Hierarchical MARDL Approach [59.81681228738068]
In adversarial environments, achieving efficient collaboration while ensuring covert operations is a key challenge for underwater cooperative missions.<n>We propose a novel dual time-scale Hierarchical Multi-Agent Proximal Policy Optimization framework.<n>We show that the proposed framework achieves rapid convergence, outperforms benchmark algorithms in terms of performance, and maximizes long-term cooperative efficiency while ensuring covert operations.
arXiv Detail & Related papers (2025-09-16T09:31:32Z) - Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach [17.48210470289556]
Heterogeneous-Agent Trust Region Policy Optimization (HATRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training.<n> assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings.<n>We propose two approaches for allocating the KL divergence threshold across agents: HATRPO-W, a Karush-Kuhn-Tucker-based (KKT-based) method that optimize threshold assignment under global KL constraints, and HATRPO-G, a greedy algorithm that prioritizes agents based on improvement-to
arXiv Detail & Related papers (2025-08-14T04:48:46Z) - On the Effect of Regularization in Policy Mirror Descent [0.0]
Policy Mirror Descent (PMD) has emerged as a unifying framework in reinforcement learning (RL)<n>PMD incorporates two key regularization components: (i) a distance term that enforces a trust region for stable policy updates and (ii) an MDP regularizer that augments the reward function to promote structure and robustness.<n>This work provides a large-scale empirical analysis of the interplay between these two regularization techniques, running over 500k training seeds on small RL environments.
arXiv Detail & Related papers (2025-07-11T16:19:45Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system.
We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z) - Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout [16.454305212398328]
We propose a goal-conditioned hierarchical reinforcement learning (HRL) framework named Guided Cooperation via Model-based Rollout (GCMR)
GCMR aims to bridge inter-layer information synchronization and cooperation by exploiting forward dynamics.
Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of HIGL, namely ACLG, yields more stable and robust policy improvement.
arXiv Detail & Related papers (2023-09-24T00:13:16Z) - The Gradient Convergence Bound of Federated Multi-Agent Reinforcement
Learning with Efficient Communication [20.891460617583302]
The paper considers independent reinforcement learning (IRL) for collaborative decision-making in the paradigm of federated learning (FL)
FL generates excessive communication overheads between agents and a remote central server.
This paper proposes two advanced optimization schemes to improve the system's utility value.
arXiv Detail & Related papers (2021-03-24T07:21:43Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.