CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space
- URL: http://arxiv.org/abs/2601.05675v1
- Date: Fri, 09 Jan 2026 09:50:47 GMT
- Title: CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space
- Authors: Bingyi Liu, Jinbo He, Haiyong Shi, Enshu Wang, Weizhen Han, Jingxiang Hao, Peixi Wang, Zhuangzhuang Zhang,
- Abstract summary: We propose a textbfCooperative Hybrid Diffusion Policies (CHDP) framework to solve the hybrid action space problem.<n>CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively.<n>On challenging hybrid action benchmarks, CHDP outperforms the state-of-the-art method by up to $19.3%$ in success rate.
- Score: 9.192754462575218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. To address this challenge, we view the hybrid action space problem as a fully cooperative game and propose a \textbf{Cooperative Hybrid Diffusion Policies (CHDP)} framework to solve it. CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively. The continuous policy is conditioned on the discrete action's representation, explicitly modeling the dependency between them. This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces. To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation. Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a low-dimensional latent space. This mapping enables the discrete policy to learn in a compact, structured space. Finally, we design a Q-function-based guidance mechanism to align the codebook's embeddings with the discrete policy's representation during training. On challenging hybrid action benchmarks, CHDP outperforms the state-of-the-art method by up to $19.3\%$ in success rate.
Related papers
- Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy [52.106797722292896]
We present DCDP, a Dynamic Closed-Loop Diffusion Policy framework that integrates chunk-based action generation with real-time correction.<n>In dynamic PushT simulations, DCDP improves adaptability by 19% without retraining while requiring only 5% additional computation.
arXiv Detail & Related papers (2026-03-02T15:04:18Z) - Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies [51.24079409973799]
Diffusion-based generative models are well-positioned to meet the needs of online Multi-Agent Reinforcement Learning (MARL)<n>We propose among the first underlineOnline off-policy underlineMARL framework using underlineDiffusion policies to orchestrate coordination.<n>Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood.
arXiv Detail & Related papers (2026-02-20T15:38:02Z) - Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces [4.395837214164745]
We propose Distance-Guided Reinforcement Learning (DGRL) to enable efficient RL in spaces with up to 10$text20$ actions.<n>We demonstrate performance improvements of up to 66% against state-of-the-art benchmarks across regularly and irregularly structured environments.
arXiv Detail & Related papers (2026-02-09T13:05:07Z) - Learning Policy Representations for Steerable Behavior Synthesis [80.4542176039074]
Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time.<n>We show that these representations can be approximated uniformly for a range of policies using a set-based architecture.<n>We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions.
arXiv Detail & Related papers (2026-01-29T21:52:06Z) - Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions [31.697208397735395]
Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness.<n>We propose a solver-induced emphlatent spherical flow policy that brings the expressiveness of modern generative policies to the RL while guaranteeing feasibility by design.<n>Our approach outperforms state-of-the-art baselines by an average of 20.6% across a range of challenging RL tasks.
arXiv Detail & Related papers (2026-01-29T18:49:07Z) - Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces [57.466101098183884]
Reinforcement learning (RL) struggles to scale to large, action spaces common in many real-world problems.<n>This paper introduces a novel framework for training discrete diffusion models as highly effective policies in complex settings.
arXiv Detail & Related papers (2025-09-26T21:53:36Z) - Distribution Parameter Actor-Critic: Shifting the Agent-Environment Boundary for Diverse Action Spaces [22.711839917754375]
We introduce a novel reinforcement learning (RL) framework that treats distribution parameters as actions.<n>This reization makes the new action space continuous, regardless of the original action type.<n>We demonstrate competitive performance on the same environments with discretized action spaces.
arXiv Detail & Related papers (2025-06-19T21:19:19Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z) - Dynamic Neighborhood Construction for Structured Large Discrete Action
Spaces [2.285821277711785]
Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning.
Existing solution approaches can handle unstructured LDAS with up to a few million actions.
We propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS.
arXiv Detail & Related papers (2023-05-31T14:26:14Z) - HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via
Hybrid Action Representation [30.621472051415857]
Most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or continuous action space.
We propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space.
We evaluate HyAR in a variety of environments with discrete-continuous action space.
arXiv Detail & Related papers (2021-09-12T11:26:27Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.