Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2511.11607v1
- Date: Sun, 02 Nov 2025 13:45:01 GMT
- Title: Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning
- Authors: Guoqing Ma, Yuhan Zhang, Yuming Dai, Guangfu Hao, Yang Chen, Shan Yu,
- Abstract summary: Reinforcement learning (RL) has made significant advancements, achieving performance in various tasks.<n>Many environments are inherently non-stationary. This non-stationarity results in the requirement of millions of iterations, leading to low sample efficiency.<n>We introduce the Clustering Orthogonal Modified Weight (COWM) layer, which can be integrated into the policy network of any RL algorithm and mitigate non-stationarity effectively.
- Score: 22.966488300685484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has made significant advancements, achieving superhuman performance in various tasks. However, RL agents often operate under the assumption of environmental stationarity, which poses a great challenge to learning efficiency since many environments are inherently non-stationary. This non-stationarity results in the requirement of millions of iterations, leading to low sample efficiency. To address this issue, we introduce the Clustering Orthogonal Weight Modified (COWM) layer, which can be integrated into the policy network of any RL algorithm and mitigate non-stationarity effectively. The COWM layer stabilizes the learning process by employing clustering techniques and a projection matrix. Our approach not only improves learning speed but also reduces gradient interference, thereby enhancing the overall learning efficiency. Empirically, the COWM outperforms state-of-the-art methods and achieves improvements of 9% and 12.6% in vision based and state-based DMControl benchmark. It also shows robustness and generality across various algorithms and tasks.
Related papers
- Orthogonal Weight Modification Enhances Learning Scalability and Convergence Efficiency without Gradient Backpropagation [12.600974823413381]
We propose a perturbation-based approach called LOw-rank Cluster Orthogonal (LOCO) weight modification.<n>Through extensive evaluations on multiple datasets, LOCO demonstrates the capability to locally train the deepest spiking neural networks.<n>This offers a promising direction for achieving high-performance, real-time, and lifelong learning on neuromorphic systems.
arXiv Detail & Related papers (2026-02-25T01:46:43Z) - Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z) - Accelerating Spectral Clustering under Fairness Constraints [56.865810822418744]
We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework.<n>We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work.
arXiv Detail & Related papers (2025-06-09T18:46:27Z) - SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability.<n>We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases.<n>During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.<n>Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning [10.117626902557927]
Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data.<n>This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments.
arXiv Detail & Related papers (2024-12-18T20:25:04Z) - Neural-Kernel Conditional Mean Embeddings [26.862984140099837]
Kernel conditional mean embeddings (CMEs) offer a powerful framework for representing conditional distribution, but they often face scalability and challenges.
We propose a new method that effectively combines the strengths of deep learning with CMEs in order to address these challenges.
In conditional density estimation tasks, our NN-CME hybrid achieves competitive performance and often surpasses existing deep learning-based methods.
arXiv Detail & Related papers (2024-03-16T08:51:02Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Improved Context-Based Offline Meta-RL with Attention and Contrastive
Learning [1.3106063755117399]
We improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives.
Theoretical analysis and experiments are presented to demonstrate the superior performance, efficiency and robustness of our end-to-end and model free method.
arXiv Detail & Related papers (2021-02-22T05:05:16Z) - Maximum Mutation Reinforcement Learning for Scalable Control [25.935468948833073]
Reinforcement Learning (RL) has demonstrated data efficiency and optimal control over large state spaces at the cost of scalable performance.
We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm.
arXiv Detail & Related papers (2020-07-24T16:29:19Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.