slimTrain -- A Stochastic Approximation Method for Training Separable
Deep Neural Networks
- URL: http://arxiv.org/abs/2109.14002v1
- Date: Tue, 28 Sep 2021 19:31:57 GMT
- Title: slimTrain -- A Stochastic Approximation Method for Training Separable
Deep Neural Networks
- Authors: Elizabeth Newman, Julianne Chung, Matthias Chung, Lars Ruthotto
- Abstract summary: DeepTrain networks (DNNs) have shown their success as high-dimensional neural function approximators in many applications.
We propose slimTrain, a modest optimization method for training DNNs with reduced sensitivity to the choice hyper-dimensional datasets.
- Score: 2.4373900721120285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) have shown their success as high-dimensional
function approximators in many applications; however, training DNNs can be
challenging in general. DNN training is commonly phrased as a stochastic
optimization problem whose challenges include non-convexity, non-smoothness,
insufficient regularization, and complicated data distributions. Hence, the
performance of DNNs on a given task depends crucially on tuning
hyperparameters, especially learning rates and regularization parameters. In
the absence of theoretical guidelines or prior experience on similar tasks,
this requires solving many training problems, which can be time-consuming and
demanding on computational resources. This can limit the applicability of DNNs
to problems with non-standard, complex, and scarce datasets, e.g., those
arising in many scientific applications. To remedy the challenges of DNN
training, we propose slimTrain, a stochastic optimization method for training
DNNs with reduced sensitivity to the choice hyperparameters and fast initial
convergence. The central idea of slimTrain is to exploit the separability
inherent in many DNN architectures; that is, we separate the DNN into a
nonlinear feature extractor followed by a linear model. This separability
allows us to leverage recent advances made for solving large-scale, linear,
ill-posed inverse problems. Crucially, for the linear weights, slimTrain does
not require a learning rate and automatically adapts the regularization
parameter. Since our method operates on mini-batches, its computational
overhead per iteration is modest. In our numerical experiments, slimTrain
outperforms existing DNN training methods with the recommended hyperparameter
settings and reduces the sensitivity of DNN training to the remaining
hyperparameters.
Related papers
- Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty
from Pre-trained Models [40.38541033389344]
Deep Neural Networks (DNNs) are powerful tools for various computer vision tasks, yet they often struggle with reliable uncertainty quantification.
We introduce the Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to seamlessly transform DNNs into BNNs.
We conduct extensive experiments across multiple datasets for image classification and semantic segmentation tasks, and our results demonstrate that ABNN achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-12-23T16:39:24Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Quantum-Inspired Tensor Neural Networks for Option Pricing [4.3942901219301564]
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions.
A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs.
This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to control for industrial applications.
arXiv Detail & Related papers (2022-12-28T19:39:55Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Rethinking Pretraining as a Bridge from ANNs to SNNs [13.984523794353477]
Spiking neural networks (SNNs) are known as a typical kind of brain-inspired models with their unique features.
How to obtain a high-accuracy model has always been the main challenge in the field of SNN.
arXiv Detail & Related papers (2022-03-02T14:59:57Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training [2.5025363034899732]
We present a novel approach to add the training ability to a baseline DNN accelerator (inference only) by splitting the SGD algorithm into simple computational elements.
Based on this approach we propose TaxoNN, a light-weight accelerator for DNN training.
Our experimental results show that TaxoNN delivers, on average, 0.97% higher misclassification rate compared to a full-precision implementation.
arXiv Detail & Related papers (2020-10-11T09:04:19Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities.
We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.