Related papers: Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

URL: http://arxiv.org/abs/2412.14355v1
Date: Wed, 18 Dec 2024 21:43:40 GMT
Title: Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Authors: Matthew Riemer, Gopeshh Subbaraj, Glen Berseth, Irina Rish,
Abstract summary: Realtime environments change even as agents perform action inference and learning.<n>Recent advances in machine learning involve larger neural networks with longer inference times.<n>We present an analysis of lower bounds on regret in realtime reinforcement learning.
Score: 22.106900089984318
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectively minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pok\'emon and Tetris.

Related papers

First-Passage Approach to Optimizing Perturbations for Improved Training of Machine Learning Models [0.0]
Several protocols have been developed to improve the training of machine learning models. We frame them as first-passage processes and consider their response to perturbations. We show that if the unperturbed learning process reaches a quasi-steady state, the response at a single perturbation frequency can predict the behavior at a wide range of timescales.
arXiv Detail & Related papers (2025-02-06T14:53:21Z)
BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting [46.922741972636025]
Time-series forecasting is crucial for numerous real-world applications including weather prediction and financial market modeling. We propose BEAT (Balanced frEquency Adaptive Tuning), a novel framework that monitors the training status for each frequency and adaptively adjusts their gradient updates. BEAT consistently outperforms state-of-the-art approaches in experiments on seven real-world datasets.
arXiv Detail & Related papers (2025-01-31T11:52:35Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks. Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z)
Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning. Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z)
Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF) Our model avoids the influence of cumulative error and does not increase the time complexity. Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots [2.3061446605472558]
We show that when learning updates are expensive, the performance of sequential learning diminishes and is outperformed by asynchronous learning by a substantial margin. Our system learns in real-time to reach and track visual targets from pixels within two hours of experience and does so directly using real robots.
arXiv Detail & Related papers (2022-03-23T23:05:28Z)
Deep Explicit Duration Switching Models for Time Series [84.33678003781908]
We propose a flexible model that is capable of identifying both state- and time-dependent switching dynamics. State-dependent switching is enabled by a recurrent state-to-switch connection. An explicit duration count variable is used to improve the time-dependent switching behavior.
arXiv Detail & Related papers (2021-10-26T17:35:21Z)
STRODE: Stochastic Boundary Ordinary Differential Equation [30.237665903943963]
Most algorithms for time-series modeling fail to learn dynamics of random event timings directly from visual or audio inputs. We present a probabilistic ordinary differential equation (ODE) that learns both the timings and the dynamics of time series data without requiring any timing annotations during training. Our results show that our approach successfully infers event timings of time series data.
arXiv Detail & Related papers (2021-07-17T16:25:46Z)
Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting [48.8617204809538]
We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. Our model can be trained effectively using variational inference and generates predictions with Monte-Carlo simulation.
arXiv Detail & Related papers (2021-01-31T11:00:55Z)
Action-Conditional Recurrent Kalman Networks For Forward and Inverse Dynamics Learning [17.80270555749689]
Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for robots. We present two architectures for forward model learning and one for inverse model learning. Both architectures significantly outperform exist-ing model learning frameworks as well as analytical models in terms of prediction performance.
arXiv Detail & Related papers (2020-10-20T11:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.