Globally Convergent Multilevel Training of Deep Residual Networks
- URL: http://arxiv.org/abs/2107.07572v1
- Date: Thu, 15 Jul 2021 19:08:58 GMT
- Title: Globally Convergent Multilevel Training of Deep Residual Networks
- Authors: Alena Kopani\v{c}\'akov\'a and Rolf Krause
- Abstract summary: We propose a globally convergent multilevel training method for deep residual networks (ResNets)
The devised method operates in hybrid (stochastic-deterministic) settings by adaptively adjusting mini-batch sizes during the training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a globally convergent multilevel training method for deep residual
networks (ResNets). The devised method can be seen as a novel variant of the
recursive multilevel trust-region (RMTR) method, which operates in hybrid
(stochastic-deterministic) settings by adaptively adjusting mini-batch sizes
during the training. The multilevel hierarchy and the transfer operators are
constructed by exploiting a dynamical system's viewpoint, which interprets
forward propagation through the ResNet as a forward Euler discretization of an
initial value problem. In contrast to traditional training approaches, our
novel RMTR method also incorporates curvature information on all levels of the
multilevel hierarchy by means of the limited-memory SR1 method. The overall
performance and the convergence properties of our multilevel training method
are numerically investigated using examples from the field of classification
and regression.
Related papers
- LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Alternate Training of Shared and Task-Specific Parameters for Multi-Task
Neural Networks [49.1574468325115]
This paper introduces novel alternate training procedures for hard- parameter sharing Multi-Task Neural Networks (MTNNs)
The proposed alternate training method updates shared and task-specific weights alternately, exploiting the multi-head architecture of the model.
Empirical experiments demonstrate delayed overfitting, improved prediction, and reduced computational demands.
arXiv Detail & Related papers (2023-12-26T21:33:03Z) - Parallel Trust-Region Approaches in Neural Network Training: Beyond
Traditional Methods [0.0]
We propose to train neural networks (NNs) using a novel variant of the Additively Preconditioned Trust-region Strategy'' (APTS)
The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters.
arXiv Detail & Related papers (2023-12-21T09:00:24Z) - Online Multi-Task Learning with Recursive Least Squares and Recursive Kernel Methods [50.67996219968513]
We introduce two novel approaches for Online Multi-Task Learning (MTL) Regression Problems.
We achieve exact and approximate recursions with quadratic per-instance cost on the dimension of the input space.
We compare our online MTL methods to other contenders in a real-world wind speed forecasting case study.
arXiv Detail & Related papers (2023-08-03T01:41:34Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Multilevel-in-Layer Training for Deep Neural Network Regression [1.6185544531149159]
We present a multilevel regularization strategy that constructs and trains a hierarchy of neural networks.
We experimentally show with PDE regression problems that our multilevel training approach is an effective regularizer.
arXiv Detail & Related papers (2022-11-11T23:53:46Z) - Multi-Modal Recurrent Fusion for Indoor Localization [24.138127040942127]
This paper considers indoor localization using multi-modal wireless signals including Wi-Fi, inertial measurement unit (IMU), and ultra-wideband (UWB)
A multi-stream recurrent fusion method is proposed to combine the current hidden state of each modality in the context of recurrent neural networks.
arXiv Detail & Related papers (2022-02-19T02:46:49Z) - Credit Assignment with Meta-Policy Gradient for Multi-Agent
Reinforcement Learning [29.895142928565228]
We propose a general meta-learning-based Mixing Network with Meta Policy Gradient(MNMPG) framework to distill the global hierarchy for delicate reward decomposition.
Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current state-of-the-art MARL algorithms.
arXiv Detail & Related papers (2021-02-24T12:03:37Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z) - Multilevel Minimization for Deep Residual Networks [0.0]
We present a new multilevel minimization framework for the training of deep residual networks (ResNets)
Our framework is based on the dynamical system's viewpoint, which formulates a ResNet as the discretization of an initial value problem.
By design, our framework is conveniently independent of the choice of the training strategy chosen on each level of the multilevel hierarchy.
arXiv Detail & Related papers (2020-04-13T20:52:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.