Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks
- URL: http://arxiv.org/abs/2501.12739v2
- Date: Wed, 12 Mar 2025 16:05:08 GMT
- Title: Multiscale Stochastic Gradient Descent: Efficiently Training Convolutional Neural Networks
- Authors: Niloufar Zakariaei, Shadab Ahamed, Eldad Haber, Moshe Eliasof,
- Abstract summary: Multiscale Gradient Descent (Multiscale-SGD) is a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost.<n>We introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions.<n>Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.
- Score: 6.805997961535213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic Gradient Descent (SGD) is the foundation of modern deep learning optimization but becomes increasingly inefficient when training convolutional neural networks (CNNs) on high-resolution data. This paper introduces Multiscale Stochastic Gradient Descent (Multiscale-SGD), a novel optimization approach that exploits coarse-to-fine training strategies to estimate the gradient at a fraction of the cost, improving the computational efficiency of SGD type methods while preserving model accuracy. We derive theoretical criteria for Multiscale-SGD to be effective, and show that while standard convolutions can be used, they can be suboptimal for noisy data. This leads us to introduce a new class of learnable, scale-independent Mesh-Free Convolutions (MFCs) that ensure consistent gradient behavior across resolutions, making them well-suited for multiscale training. Through extensive empirical validation, we demonstrate that in practice, (i) our Multiscale-SGD approach can be used to train various architectures for a variety of tasks, and (ii) when the noise is not significant, standard convolutions benefit from our multiscale training framework. Our results establish a new paradigm for the efficient training of deep networks, enabling practical scalability in high-resolution and multiscale learning tasks.
Related papers
- Optimizing ML Training with Metagradient Descent [69.89631748402377]
We introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale.
We then introduce a "smooth model training" framework that enables effective optimization using metagradients.
arXiv Detail & Related papers (2025-03-17T22:18:24Z) - Meta-Sparsity: Learning Optimal Sparse Structures in Multi-task Networks through Meta-learning [4.462334751640166]
meta-sparsity is a framework for learning model sparsity that allows deep neural networks (DNNs) to generate optimal sparse shared structures in multi-task learning setting.
Inspired by Model Agnostic Meta-Learning (MAML), the emphasis is on learning shared and optimally sparse parameters in multi-task scenarios.
The effectiveness of meta-sparsity is rigorously evaluated by extensive experiments on two datasets.
arXiv Detail & Related papers (2025-01-21T13:25:32Z) - Gradient-free variational learning with conditional mixture networks [39.827869318925494]
We introduce CAVI-CMN, a fast, gradient-free variational method for training conditional mixture networks (CMNs)
CAVI-CMN achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.
As input size or the number of experts increases, computation time scales competitively with MLE.
arXiv Detail & Related papers (2024-08-29T10:43:55Z) - Training Artificial Neural Networks by Coordinate Search Algorithm [0.20971479389679332]
We propose an efficient version of the gradient-free Coordinate Search (CS) algorithm for training neural networks.
The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems.
Finding the optimal values for weights of ANNs is a large-scale optimization problem.
arXiv Detail & Related papers (2024-02-20T01:47:25Z) - conv_einsum: A Framework for Representation and Fast Evaluation of
Multilinear Operations in Convolutional Tensorial Neural Networks [28.416123889998243]
We develop a framework for representing tensorial convolution layers as einsum-like strings and a meta-algorithm conv_einsum which is able to evaluate these strings in a FLOPs-minimizing manner.
arXiv Detail & Related papers (2024-01-07T04:30:12Z) - An NMF-Based Building Block for Interpretable Neural Networks With
Continual Learning [0.8158530638728501]
Existing learning methods often struggle to balance interpretability and predictive performance.
Our approach aims to strike a better balance between these two aspects through the use of a building block based on NMF.
arXiv Detail & Related papers (2023-11-20T02:00:33Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Multi-Objective Optimization for Sparse Deep Multi-Task Learning [0.0]
We present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs)
Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with particular focus on Deep Multi-Task models.
arXiv Detail & Related papers (2023-08-23T16:42:27Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Quantization-aware Interval Bound Propagation for Training Certifiably
Robust Quantized Neural Networks [58.195261590442406]
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs)
Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization.
We present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs.
arXiv Detail & Related papers (2022-11-29T13:32:38Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Regularized Adaptation for Stable and Efficient Continuous-Level
Learning on Image Processing Networks [7.730087303035803]
We propose a novel continuous-level learning framework using a Filter Transition Network (FTN)
FTN is a non-linear module that easily adapt to new levels, and is regularized to prevent undesirable side-effects.
Extensive results for various image processing indicate that the performance of FTN is stable in terms of adaptation and adaptation.
arXiv Detail & Related papers (2020-03-11T07:46:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.