Related papers: Why should we add early exits to neural networks?

Why should we add early exits to neural networks?

URL: http://arxiv.org/abs/2004.12814v2
Date: Tue, 23 Jun 2020 07:42:10 GMT
Title: Why should we add early exits to neural networks?
Authors: Simone Scardapane, Michele Scarpiniti, Enzo Baccarelli, Aurelio Uncini
Abstract summary: Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a number of advantages, including: (i) significant reductions of the inference time, (ii) reduced tendency to overfitting and vanishing gradients, and (iii) capability of being distributed over multi-tier platforms.
Score: 16.793040797308105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Recently, some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a number of advantages, including: (i) significant reductions of the inference time, (ii) reduced tendency to overfitting and vanishing gradients, and (iii) capability of being distributed over multi-tier computation platforms. In addition, they connect to the wider themes of biological plausibility and layered cognitive reasoning. In this paper, we provide a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios. We also describe in-depth their application scenarios in 5G and Fog computing environments, as long as some of the open research questions connected to them.

Related papers

Explaining Deep Neural Networks by Leveraging Intrinsic Methods [0.9790236766474201]
This thesis contributes to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks. The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings. Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values.
arXiv Detail & Related papers (2024-07-17T01:20:17Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions [3.04585143845864]
We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
arXiv Detail & Related papers (2023-06-02T13:02:52Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Predictive Coding: Towards a Future of Deep Learning beyond Backpropagation? [41.58529335439799]
The backpropagation of error algorithm used to train deep neural networks has been fundamental to the successes of deep learning. Recent work has developed the idea into a general-purpose algorithm able to train neural networks using only local computations. We show the substantially greater flexibility of predictive coding networks against equivalent deep neural networks.
arXiv Detail & Related papers (2022-02-18T22:57:03Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Prior knowledge distillation based on financial time series [0.8756822885568589]
We propose to use neural networks to represent indicators and train a large network constructed of smaller networks as feature layers. In numerical experiments, we find that our algorithm is faster and more accurate than traditional methods on real financial datasets.
arXiv Detail & Related papers (2020-06-16T15:26:06Z)
Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices. The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)
NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks [2.398608007786179]
Long training times of deep neural networks are a bottleneck in machine learning research. We provide a theoretical foundation for the choice of intra-layer topology. We show that seemingly similar topologies can often have a large difference in attainable accuracy.
arXiv Detail & Related papers (2020-02-19T18:29:18Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.