Related papers: Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks

Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks

URL: http://arxiv.org/abs/2601.21572v1
Date: Thu, 29 Jan 2026 11:34:49 GMT
Title: Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks
Authors: Jinhao Li, Yuhao Sun, Zhiyuan Ma, Hao He, Xinche Zhang, Xing Chen, Jin Li, Sen Song,
Abstract summary: We propose a distributional update rule that constrains relative change by bounding KL divergence normalized by an estimated signal energy.<n>We instantiate SATR for Bernoulli connectivity distributions, which have shown strong empirical performance for RSNN optimization.<n>To make SATR practical at scale, we introduce a bitset implementation for binary spiking and binary weights.
Score: 16.009596458274952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recurrent spiking neural networks (RSNNs) are a promising substrate for energy-efficient control policies, but training them for high-dimensional, long-horizon reinforcement learning remains challenging. Population-based, gradient-free optimization circumvents backpropagation through non-differentiable spike dynamics by estimating gradients. However, with finite populations, high variance of these estimates can induce harmful and overly aggressive update steps. Inspired by trust-region methods in reinforcement learning that constrain policy updates in distribution space, we propose \textbf{Signal-Adaptive Trust Regions (SATR)}, a distributional update rule that constrains relative change by bounding KL divergence normalized by an estimated signal energy. SATR automatically expands the trust region under strong signals and contracts it when updates are noise-dominated. We instantiate SATR for Bernoulli connectivity distributions, which have shown strong empirical performance for RSNN optimization. Across a suite of high-dimensional continuous-control benchmarks, SATR improves stability under limited populations and reaches competitive returns against strong baselines including PPO-LSTM. In addition, to make SATR practical at scale, we introduce a bitset implementation for binary spiking and binary weights, substantially reducing wall-clock training time and enabling fast RSNN policy search.

Related papers

Rethinking the Trust Region in LLM Reinforcement Learning [72.25890308541334]
Proximal Policy Optimization (PPO) serves as the de facto standard algorithm for Large Language Models (LLMs)<n>We propose Divergence Proximal Policy Optimization (DPPO), which substitutes clipping with a more principled constraint.<n>DPPO achieves superior training and efficiency compared to existing methods, offering a more robust foundation for RL-based fine-tuning.
arXiv Detail & Related papers (2026-02-04T18:59:04Z)
In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior [53.21550098214227]
In-context reinforcement learning promises fast adaptation to unseen environments without parameter updates.<n>We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time.<n>We prove that SPICE achieves regret-optimal behaviour in both bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories.
arXiv Detail & Related papers (2026-01-06T13:41:31Z)
Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies [0.49109372384514843]
We introduce a policy parameterization that embeds Graph Neural Networks (GNNs) into a Youla-like magnitude-direction parameterization.<n>The magnitude is implemented as a stable operator consisting of a GNN acting on disturbance feedback, while the direction is a GNN acting on local observations.<n>Experiments show that policies trained on small networks transfer directly to larger ones and unseen network topologies, achieve higher returns and lower variance than a state-of-the-art MARL baseline.
arXiv Detail & Related papers (2025-12-20T23:35:07Z)
Plug-and-Play Homeostatic Spark: Zero-Cost Acceleration for SNN Training Across Paradigms [40.57310813106791]
Spiking neural networks offer event driven computation, sparse activation, and hardware efficiency, yet training often converges slowly and lacks stability.<n>We present Adaptive Homeostatic Spiking Activity Regulation (AHSAR), an extremely simple plug in and training paradigm method.<n>AHSAR stabilizes optimization and accelerates convergence without changing the model architecture, loss, or gradients.
arXiv Detail & Related papers (2025-12-04T17:26:46Z)
Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients [16.39606116102731]
The vanishing Polyak delivering adaptive neural network has proven to be a promising choice for gradient descent (SGD)<n> Comprehensive experiments on deep networks corroborate tight convex network theory.<n>In this work, we provide rigorous convergence guarantees for non-smooth optimization with no need for strong assumptions.
arXiv Detail & Related papers (2025-12-02T02:24:32Z)
GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping [63.33669214116784]
GRPO-Guard is a simple yet effective enhancement to existing GRPO frameworks.<n>It restores a balanced and step-consistent importance ratio, ensuring that PPO clipping properly constrains harmful updates.<n>It substantially mitigates implicit over-optimization without relying on heavy KL regularization.
arXiv Detail & Related papers (2025-10-25T14:51:17Z)
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning [50.87795054453648]
Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision-making on neuromorphic hardware.<n>Due to the discrete and non-differentiable nature of spikes, directly trained SNNs rely heavily on Batch Normalization (BN) to stabilize gradient updates.<n>In online Reinforcement Learning (RL), BN statistics hinder exploitation, resulting in slower convergence and suboptimal policies.
arXiv Detail & Related papers (2025-09-28T10:21:17Z)
Local Pairwise Distance Matching for Backpropagation-Free Reinforcement Learning [0.9065034043031668]
Training neural networks with reinforcement learning (RL) typically relies on backpropagation (BP)<n>BP requires storage of activations from the forward pass for subsequent backward updates.<n>We propose a novel approach that trains each layer of the neural network using local signals during the forward pass in RL settings.
arXiv Detail & Related papers (2025-07-15T14:39:41Z)
Conformal Symplectic Optimization for Stable Reinforcement Learning [21.491621524500736]
By utilizing relativistic kinetic energy, RAD incorporates from special relativity and limits parameter updates below a finite speed, effectively mitigating abnormal influences.<n> Notably, RAD achieves up to a 155.1% performance improvement, showcasing its efficacy in training Atari games.
arXiv Detail & Related papers (2024-12-03T09:07:31Z)
Towards the Flatter Landscape and Better Generalization in Federated Learning under Client-level Differential Privacy [67.33715954653098]
We propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP. Specifically, DP-FedSAM integrates Sharpness Aware of Minimization (SAM) to generate local flatness models with stability and weight robustness. To further reduce the magnitude random noise while achieving better performance, we propose DP-FedSAM-$top_k$ by adopting the local update sparsification technique.
arXiv Detail & Related papers (2023-05-01T15:19:09Z)
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks [44.37047471448793]
In this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL) We propose an innovative PSL framework, namely, efficient parallel split learning (EPSL) to accelerate model training. We show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy.
arXiv Detail & Related papers (2023-03-26T16:09:48Z)
Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states. We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization. Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.