Multilevel-in-Layer Training for Deep Neural Network Regression
- URL: http://arxiv.org/abs/2211.06515v1
- Date: Fri, 11 Nov 2022 23:53:46 GMT
- Title: Multilevel-in-Layer Training for Deep Neural Network Regression
- Authors: Colin Ponce, Ruipeng Li, Christina Mao, Panayot Vassilevski
- Abstract summary: We present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks.
We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer.
- Score: 1.6185544531149159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common challenge in regression is that for many problems, the degrees of
freedom required for a high-quality solution also allows for overfitting.
Regularization is a class of strategies that seek to restrict the range of
possible solutions so as to discourage overfitting while still enabling good
solutions, and different regularization strategies impose different types of
restrictions. In this paper, we present a multilevel regularization strategy
that constructs and trains a hierarchy of neural networks, each of which has
layers that are wider versions of the previous network's layers. We draw
intuition and techniques from the field of Algebraic Multigrid (AMG),
traditionally used for solving linear and nonlinear systems of equations, and
specifically adapt the Full Approximation Scheme (FAS) for nonlinear systems of
equations to the problem of deep learning. Training through V-cycles then
encourage the neural networks to build a hierarchical understanding of the
problem. We refer to this approach as \emph{multilevel-in-width} to distinguish
from prior multilevel works which hierarchically alter the depth of neural
networks. The resulting approach is a highly flexible framework that can be
applied to a variety of layer types, which we demonstrate with both
fully-connected and convolutional layers. We experimentally show with PDE
regression problems that our multilevel training approach is an effective
regularizer, improving the generalize performance of the neural networks
studied.
Related papers
- Efficient Implementation of a Multi-Layer Gradient-Free Online-Trainable
Spiking Neural Network on FPGA [0.31498833540989407]
ODESA is the first network to have end-to-end multi-layer online local supervised training without using gradients.
This research shows that the network architecture and the online training of weights and thresholds can be implemented efficiently on a large scale in hardware.
arXiv Detail & Related papers (2023-05-31T00:34:15Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Imbedding Deep Neural Networks [0.0]
Continuous depth neural networks, such as Neural ODEs, have refashioned the understanding of residual neural networks in terms of non-linear vector-valued optimal control problems.
We propose a new approach which explicates the network's depth' as a fundamental variable, thus reducing the problem to a system of forward-facing initial value problems.
arXiv Detail & Related papers (2022-01-31T22:00:41Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Multilevel Minimization for Deep Residual Networks [0.0]
We present a new multilevel minimization framework for the training of deep residual networks (ResNets)
Our framework is based on the dynamical system's viewpoint, which formulates a ResNet as the discretization of an initial value problem.
By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy.
arXiv Detail & Related papers (2020-04-13T20:52:26Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.