Identifying Policy Gradient Subspaces
- URL: http://arxiv.org/abs/2401.06604v3
- Date: Mon, 18 Mar 2024 09:51:00 GMT
- Title: Identifying Policy Gradient Subspaces
- Authors: Jan Schneider, Pierre Schumacher, Simon Guist, Le Chen, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler,
- Abstract summary: Policy gradient methods hold great potential for solving complex continuous control tasks.
Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace.
- Score: 42.75990181248372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.
Related papers
- Predictable Gradient Manifolds in Deep Learning: Temporal Path-Length and Intrinsic Rank as a Complexity Regime [0.0]
Empirically, along training trajectories are often temporally predictable and evolve within a low-dimensional subspace.<n>We formalize this observation through a measurable framework for predictable predictable dimension gradients.<n>We introduce new directions for adaptive gradients, rank-aware tracking, and prediction-based design grounded in measurable properties of real training runs.
arXiv Detail & Related papers (2026-01-07T11:23:55Z) - Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance [41.58239719458457]
Multimodal continual instruction tuning enables large language models to sequentially adapt to new tasks while building upon previously acquired knowledge.<n>However, this continual learning paradigm faces the significant challenge of catastrophic forgetting, where learning new tasks leads to performance degradation on previous ones.<n>We introduce a novel insight into catastrophic forgetting by conceptualizing it as a problem of missing gradients from old tasks during new task learning.
arXiv Detail & Related papers (2025-11-19T06:29:15Z) - Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks [6.185603604308997]
Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains.<n> Spiking Neural Networks (SNNs) represent a promising algorithmic approach for these systems, yet their application to complex control tasks faces two critical challenges.<n>We propose a novel training approach that leverages a privileged guiding policy to bootstrap the learning process, while still exploiting online environment interactions with the spiking policy.
arXiv Detail & Related papers (2025-10-28T14:28:40Z) - Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space [15.65017469378437]
Policy-gradient methods such as PPO are updated along a single gradient direction, leaving the rich local structure of the parameter space unexplored.<n>Previous work has shown that the surrogate gradient is often poorly correlated with the true reward landscape.<n>We introduce ExploRLer, a pluggable pipeline that seamlessly integrates with on-policy algorithms such as PPO and TRPO.
arXiv Detail & Related papers (2025-09-30T07:13:55Z) - SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting [68.00007494819798]
Continual learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks.<n> Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two subspaces.<n>New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge.<n>Existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space.
arXiv Detail & Related papers (2025-05-28T13:57:56Z) - Behind the Myth of Exploration in Policy Gradients [1.9171404264679484]
Policy-gradient algorithms are effective reinforcement learning methods for solving control problems with continuous state and action spaces.
To compute near-optimal policies, it is essential in practice to include exploration terms in the learning objective.
arXiv Detail & Related papers (2024-01-31T20:37:09Z) - Class Gradient Projection For Continual Learning [99.105266615448]
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL)
We propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks.
arXiv Detail & Related papers (2023-11-25T02:45:56Z) - Continual Learning with Scaled Gradient Projection [8.847574864259391]
In neural networks, continual learning results in gradient interference among sequential tasks, leading to forgetting of old tasks while learning new ones.
We propose a Scaled Gradient Projection (SGP) method to improve new learning while minimizing forgetting.
We conduct experiments ranging from continual image classification to reinforcement learning tasks and report better performance with less training overhead than the state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-02T19:46:39Z) - Efficient Meta-Learning for Continual Learning with Taylor Expansion
Approximation [2.28438857884398]
Continual learning aims to alleviate catastrophic forgetting when handling consecutive tasks under non-stationary distributions.
We propose a novel efficient meta-learning algorithm for solving the online continual learning problem.
Our method achieves better or on-par performance and much higher efficiency compared to the state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-03T04:57:05Z) - Bag of Tricks for Natural Policy Gradient Reinforcement Learning [87.54231228860495]
We have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning.
The proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark.
arXiv Detail & Related papers (2022-01-22T17:44:19Z) - Inverse Reinforcement Learning from a Gradient-based Learner [41.8663538249537]
Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations.
In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent.
arXiv Detail & Related papers (2020-07-15T16:41:00Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule.
We introduce a "grafting" experiment which decouples an update's magnitude from its direction.
We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.