Related papers: HiPPO: Recurrent Memory with Optimal Polynomial Projections

HiPPO: Recurrent Memory with Optimal Polynomial Projections

URL: http://arxiv.org/abs/2008.07669v2
Date: Fri, 23 Oct 2020 02:48:03 GMT
Title: HiPPO: Recurrent Memory with Optimal Polynomial Projections
Authors: Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re
Abstract summary: We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale.
Score: 93.3537706398653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.

Related papers

Recurrent Memory for Online Interdomain Gaussian Processes [18.91108540244912]
We propose a novel online Gaussian process (GP) model that is capable of capturing long-term memory in sequential data in an online regression setting. Our model, Online HiPPO Sparse Variational Gaussian Process Regression (OHSGPR), leverages the HiPPO framework, which is popularized in the RNN domain due to its long-range memory modeling capabilities. We show that the HiPPO framework fits naturally into the interdomain GP framework and demonstrate that the kernel matrices can also be updated online in a recurrence form based on the ODE evolution of HiPPO.
arXiv Detail & Related papers (2025-02-12T19:18:50Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Cumulative Distribution Function based General Temporal Point Processes [49.758080415846884]
CuFun model represents a novel approach to TPPs that revolves around the Cumulative Distribution Function (CDF) Our approach addresses several critical issues inherent in traditional TPP modeling. Our contributions encompass the introduction of a pioneering CDF-based TPP model, the development of a methodology for incorporating past event information into future event prediction.
arXiv Detail & Related papers (2024-02-01T07:21:30Z)
Brain-Inspired Spiking Neural Network for Online Unsupervised Time Series Prediction [13.521272923545409]
We present a novel Continuous Learning-based Unsupervised Recurrent Spiking Neural Network Model (CLURSNN) CLURSNN makes online predictions by reconstructing the underlying dynamical system using Random Delay Embedding. We show that the proposed online time series prediction methodology outperforms state-of-the-art DNN models when predicting an evolving Lorenz63 dynamical system.
arXiv Detail & Related papers (2023-04-10T16:18:37Z)
How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer [2.1485350418225244]
We give an exact macroscopic characterization of the generalization behavior of randomized, shallow NNs with ReLU activation. We show that RSNs correspond to a generalized additive model (GAM)-typed regression in which infinitely many directions are considered.
arXiv Detail & Related papers (2023-03-20T21:05:47Z)
Online Evolutionary Neural Architecture Search for Multivariate Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm. ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks. Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z)
Selective Memory Recursive Least Squares: Recast Forgetting into Memory in RBF Neural Network Based Real-Time Learning [2.31120983784623]
In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used. This paper proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions and a synthesized objective function is developed using synthesized samples from each partition.
arXiv Detail & Related papers (2022-11-15T05:29:58Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection [40.21502451136054]
This work presents DGHL, a new family of generative models for time series anomaly detection. A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently. Our method outperformed current state-of-the-art models on four popular benchmark datasets.
arXiv Detail & Related papers (2022-02-15T17:19:44Z)
CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep Representation Learning from Sporadic Temporal Data [1.8352113484137622]
In this paper, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data. The proposed model, called CARRNN, uses a generalized discrete-time autoregressive model that is trainable end-to-end using neural networks modulated by time lags. It is applied to multivariate time-series regression tasks using data provided for Alzheimer's disease progression modeling and intensive care unit (ICU) mortality rate prediction.
arXiv Detail & Related papers (2021-04-08T12:43:44Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.