Related papers: Training cascaded networks for speeded decisions using a temporal-difference loss

Training cascaded networks for speeded decisions using a temporal-difference loss

URL: http://arxiv.org/abs/2102.09808v1
Date: Fri, 19 Feb 2021 08:40:19 GMT
Title: Training cascaded networks for speeded decisions using a temporal-difference loss
Authors: Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio
Abstract summary: Deep feedforward neural networks operate in sequential stages. In our work, we construct a cascaded ResNet by introducing a propagation delay into each residual block. Because information transmitted through skip connections avoids delays, the functional depth of the architecture increases over time.
Score: 39.79639377894641
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in sequential stages wherein each layer fully completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but transmission is gradual over time. In our work, we construct a cascaded ResNet by introducing a propagation delay into each residual block and updating all layers in parallel in a stateful manner. Because information transmitted through skip connections avoids delays, the functional depth of the architecture increases over time and yields a trade off between processing speed and accuracy. We introduce a temporal-difference (TD) training loss that achieves a strictly superior speed accuracy profile over standard losses. The CascadedTD model has intriguing properties, including: typical instances are classified more rapidly than atypical instances; CascadedTD is more robust to both persistent and transient noise than is a conventional ResNet; and the time-varying output trace of CascadedTD provides a signal that can be used by `meta-cognitive' models for OOD detection and to determine when to terminate processing.

Related papers

ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling [57.91760520589592]
Scaling network depth has been a central driver behind the success of modern foundation models.<n>This paper revisits the default mechanism for deepening neural networks, namely residual connections.<n>We introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data.
arXiv Detail & Related papers (2026-02-09T18:54:18Z)
LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks [6.69087470775851]
A principled understanding of how much gradient delay needs to be introduced at each layer to achieve desired level of pipelining was not addressed.<n>We identify where delays may be legally inserted and show that the required amount of delay follows directly from the network structure.<n>When pipelining is applied at every layer, the amount of delay depends only on the number of remaining downstream stages.
arXiv Detail & Related papers (2025-12-09T01:35:08Z)
Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training [63.3991315762955]
Spiking Neural Networks (SNNs) draw inspiration from biological neurons to create realistic models for brain-like computation.<n>Most existing SNNs assume a single time constant for neuronal membrane voltage dynamics, modeled by first-order ordinary differential equations (ODEs) with Markovian characteristics.<n>We propose the Fractional SPIKE Differential Equation neural network (fspikeDE), which captures long-term dependencies in membrane voltage and spike trains through fractional-order dynamics.
arXiv Detail & Related papers (2025-07-22T18:20:56Z)
Efficient Event-based Delay Learning in Spiking Neural Networks [0.1350479308585481]
Spiking Neural Networks (SNNs) are attracting increased attention as an energy-efficient alternative to traditional Neural Networks. We propose a novel event-based training method for SNNs, grounded in the EventPropProp formalism. We show that our approach uses less than half the memory of the current state-of-the-art delay-learning method and is up to 26x faster.
arXiv Detail & Related papers (2025-01-13T13:44:34Z)
Learning Delays Through Gradients and Structure: Emergence of Spatiotemporal Patterns in Spiking Neural Networks [0.06752396542927405]
We present a Spiking Neural Network (SNN) model that incorporates learnable synaptic delays through two approaches. In the latter approach, the network selects and prunes connections, optimizing the delays in sparse connectivity settings. Our results demonstrate the potential of combining delay learning with dynamic pruning to develop efficient SNN models for temporal data processing.
arXiv Detail & Related papers (2024-07-07T11:55:48Z)
DelGrad: Exact gradients in spiking networks for learning transmission delays and weights [0.9411751957919126]
Spiking neural networks (SNNs) inherently rely on the timing of signals for representing and processing information. Recent work has demonstrated the substantial advantages of learning these delays along with synaptic weights. We propose an analytical approach for calculating exact loss gradients with respect to both synaptic weights and delays in an event-based fashion.
arXiv Detail & Related papers (2024-04-30T00:02:34Z)
Correlating sparse sensing for large-scale traffic speed estimation: A Laplacian-enhanced low-rank tensor kriging approach [76.45949280328838]
We propose a Laplacian enhanced low-rank tensor (LETC) framework featuring both lowrankness and multi-temporal correlations for large-scale traffic speed kriging. We then design an efficient solution algorithm via several effective numeric techniques to scale up the proposed model to network-wide kriging.
arXiv Detail & Related papers (2022-10-21T07:25:57Z)
Ultra-low Latency Spiking Neural Networks with Spatio-Temporal Compression and Synaptic Convolutional Block [4.081968050250324]
Spiking neural networks (SNNs) have neuro-temporal information capability, low processing feature, and high biological plausibility. Neuro-MNIST, CIFAR10-S, DVS128 gesture datasets need to aggregate individual events into frames with a higher temporal resolution for event stream classification. We propose a processing-temporal compression method to aggregate individual events into a few time steps of NIST current to reduce the training and inference latency.
arXiv Detail & Related papers (2022-03-18T15:14:13Z)
Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting. FSNet balances fast adaptation to recent changes and retrieving similar old knowledge. Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z)
Neural Network based on Automatic Differentiation Transformation of Numeric Iterate-to-Fixedpoint [1.1897857181479061]
This work proposes a Neural Network model that can control its depth using an iterate-to-fixed-point operator. In contrast to the existing skip-connection concept, this proposed technique enables information to flow up and down in the network. We evaluate models that use this novel mechanism on different long-term dependency tasks.
arXiv Detail & Related papers (2021-10-30T20:34:21Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation [4.9831085918734805]
Missing data imputation has been a long-standing research topic and critical application for real-world intelligent transportation systems. We propose a low-rank autoregressive tensor completion (LATC) framework by introducing textittemporal variation as a new regularization term. We conduct extensive numerical experiments on several real-world traffic data sets, and our results demonstrate the effectiveness of LATC in diverse missing scenarios.
arXiv Detail & Related papers (2021-04-30T12:00:57Z)
Unsupervised Monocular Depth Learning with Integrated Intrinsics and Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion. Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z)
Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models. Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems. These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.