Related papers: Training Neural Networks at Any Scale

Training Neural Networks at Any Scale

URL: http://arxiv.org/abs/2511.11163v1
Date: Fri, 14 Nov 2025 10:58:07 GMT
Title: Training Neural Networks at Any Scale
Authors: Thomas Pethick, Kimon Antonakopoulos, Antonio Silveti-Falls, Leena Chennuru Vankadara, Volkan Cevher,
Abstract summary: This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale.<n>We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem.
Score: 57.048948400182354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article reviews modern optimization methods for training neural networks with an emphasis on efficiency and scale. We present state-of-the-art optimization algorithms under a unified algorithmic template that highlights the importance of adapting to the structures in the problem. We then cover how to make these algorithms agnostic to the scale of the problem. Our exposition is intended as an introduction for both practitioners and researchers who wish to be involved in these exciting new developments.

Related papers

Faster Predictive Coding Networks via Better Initialization [52.419343840654186]
We propose a new technique for predictive coding networks that aims to preserve the iterative progress made on previous training samples.<n>Our experiments demonstrate substantial improvements in convergence speed and final test loss in both supervised and unsupervised settings.
arXiv Detail & Related papers (2026-01-28T08:52:19Z)
Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale [0.0]
This thesis investigates the evolution of optimization algorithms from classical first-order methods to modern principled higher-order techniques.<n>The analysis uncovers the limitations of these conventional approaches when confronted with anisotropy that is representative of real-world data.<n>Next, the interplay between these optimization algorithms and the broader neural network training toolkit emerges as equally essential to empirical success.
arXiv Detail & Related papers (2025-12-20T14:20:46Z)
Enhancing CNN Classification with Lamarckian Memetic Algorithms and Local Search [0.0]
We propose a novel approach integrating a two-stage training technique with population-based optimization algorithms incorporating local search capabilities. Our experiments demonstrate that the proposed method outperforms state-of-the-art gradient-based techniques.
arXiv Detail & Related papers (2024-10-26T17:31:15Z)
Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data. Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box. In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z)
Neural Algorithmic Reasoning Without Intermediate Supervision [21.852775399735005]
We focus on learning neural algorithmic reasoning only from the input-output pairs without appealing to the intermediate supervision. We build a self-supervised objective that can regularise intermediate computations of the model without access to the algorithm trajectory. We demonstrate that our approach is competitive to its trajectory-supervised counterpart on tasks from the CLRSic Algorithmic Reasoning Benchmark.
arXiv Detail & Related papers (2023-06-23T09:57:44Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Neural Combinatorial Optimization: a New Player in the Field [69.23334811890919]
This paper presents a critical analysis on the incorporation of algorithms based on neural networks into the classical optimization framework. A comprehensive study is carried out to analyse the fundamental aspects of such algorithms, including performance, transferability, computational cost and to larger-sized instances.
arXiv Detail & Related papers (2022-05-03T07:54:56Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.