Handling Delay in Real-Time Reinforcement Learning
- URL: http://arxiv.org/abs/2503.23478v1
- Date: Sun, 30 Mar 2025 15:30:27 GMT
- Title: Handling Delay in Real-Time Reinforcement Learning
- Authors: Ivan Anokhin, Rishav Rishav, Matthew Riemer, Stephen Chung, Irina Rish, Samira Ebrahimi Kahou,
- Abstract summary: We present a theoretically motivated solution that leverages temporal skip connections combined with history-augmented observations.<n>We demonstrate parallel neuron computation can accelerate inference by 6-350% on standard hardware.
- Score: 24.869578892204792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time reinforcement learning (RL) introduces several challenges. First, policies are constrained to a fixed number of actions per second due to hardware limitations. Second, the environment may change while the network is still computing an action, leading to observational delay. The first issue can partly be addressed with pipelining, leading to higher throughput and potentially better policies. However, the second issue remains: if each neuron operates in parallel with an execution time of $\tau$, an $N$-layer feed-forward network experiences observation delay of $\tau N$. Reducing the number of layers can decrease this delay, but at the cost of the network's expressivity. In this work, we explore the trade-off between minimizing delay and network's expressivity. We present a theoretically motivated solution that leverages temporal skip connections combined with history-augmented observations. We evaluate several architectures and show that those incorporating temporal skip connections achieve strong performance across various neuron execution times, reinforcement learning algorithms, and environments, including four Mujoco tasks and all MinAtar games. Moreover, we demonstrate parallel neuron computation can accelerate inference by 6-350% on standard hardware. Our investigation into temporal skip connections and parallel computations paves the way for more efficient RL agents in real-time setting.
Related papers
- Efficient Parallel Training Methods for Spiking Neural Networks with Constant Time Complexity [63.56009745101597]
Spiking Neural Networks (SNNs) often suffer from high time complexity $O(T)$ due to the sequential processing of $T$ spikes.<n>We propose a novel Fixed-point Parallel Training (FPT) method to accelerate SNN training without modifying the network architecture.
arXiv Detail & Related papers (2025-06-10T13:27:27Z) - Real-Time Execution of Action Chunking Flow Policies [49.1574468325115]
This paper presents a novel inference-time algorithm that enables asynchronous execution of action interacting systems.<n>It is applicable to any diffusion- or VLA-based systems executing out of the box with no re-training.<n>Results show that RTC is fast, performant, and uniquely robust to inference manipulation.
arXiv Detail & Related papers (2025-06-09T01:01:59Z) - Efficient Event-based Delay Learning in Spiking Neural Networks [0.1350479308585481]
Spiking Neural Networks (SNNs) are attracting increased attention as an energy-efficient alternative to traditional Neural Networks.<n>We propose a novel event-based training method for SNNs, grounded in the EventPropProp formalism.<n>We show that our approach uses less than half the memory of the current state-of-the-art delay-learning method and is up to 26x faster.
arXiv Detail & Related papers (2025-01-13T13:44:34Z) - PRF: Parallel Resonate and Fire Neuron for Long Sequence Learning in Spiking Neural Networks [6.545474731089018]
We address the efficiency and performance challenges of long sequence learning in Spiking Neural Networks (SNNs) simultaneously.
First, we propose a decoupled reset method for parallel spiking neuron training, reducing the typical Leaky Integrate-and-Fire (LIF) model's training time from $O(L2)$ to $O(Llog L)$.
Secondly, to capture long-range dependencies, we propose a Parallel Resonate and Fire (PRF) neuron, which leverages an oscillating membrane potential driven by a resonate mechanism from a differentiable reset function in the complex domain
arXiv Detail & Related papers (2024-10-04T15:51:56Z) - Decentralized Optimization in Time-Varying Networks with Arbitrary Delays [22.40154714677385]
We consider a decentralized optimization problem for networks affected by communication delays.
Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems.
To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs.
arXiv Detail & Related papers (2024-05-29T20:51:38Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent
Reinforcement Learning [28.35473469490186]
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.
We propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($mathttRSD4$)
$mathttRSD4$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively.
It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level
arXiv Detail & Related papers (2022-08-30T08:44:15Z) - Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs [64.26714148634228]
congestion control (CC) algorithms become extremely difficult to design.
It is currently not possible to deploy AI models on network devices due to their limited computational capabilities.
We build a computationally-light solution based on a recent reinforcement learning CC algorithm.
arXiv Detail & Related papers (2022-07-05T20:42:24Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and
Inter-Layer Gradient Pipelining and Multiprocessor Scheduling [6.549125450209931]
Training model parameters by backpropagation inherently create feedback loops.
The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training.
arXiv Detail & Related papers (2021-08-14T23:51:00Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.