Related papers: NOVAK: Unified adaptive optimizer for deep neural networks

NOVAK: Unified adaptive optimizer for deep neural networks

URL: http://arxiv.org/abs/2601.07876v1
Date: Sun, 11 Jan 2026 13:03:57 GMT
Title: NOVAK: Unified adaptive optimizer for deep neural networks
Authors: Sergii Kavun,
Abstract summary: NOVAK is a gradient-based optimization algorithm that integrates adaptive moment estimation, rectified learning-rate scheduling, decoupled weight regularization, multiple variants of Nesterov momentum, and lookahead synchronization into a unified, performance-oriented framework.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work introduces NOVAK, a modular gradient-based optimization algorithm that integrates adaptive moment estimation, rectified learning-rate scheduling, decoupled weight regularization, multiple variants of Nesterov momentum, and lookahead synchronization into a unified, performance-oriented framework. NOVAK adopts a dual-mode architecture consisting of a streamlined fast path designed for production. The optimizer employs custom CUDA kernels that deliver substantial speedups (3-5 for critical operations) while preserving numerical stability under standard stochastic-optimization assumptions. We provide fully developed mathematical formulations for rectified adaptive learning rates, a memory-efficient lookahead mechanism that reduces overhead from O(2p) to O(p + p/k), and the synergistic coupling of complementary optimization components. Theoretical analysis establishes convergence guarantees and elucidates the stability and variance-reduction properties of the method. Extensive empirical evaluation on CIFAR-10, CIFAR-100, ImageNet, and ImageNette demonstrates NOVAK superiority over 14 contemporary optimizers, including Adam, AdamW, RAdam, Lion, and Adan. Across architectures such as ResNet-50, VGG-16, and ViT, NOVAK consistently achieves state-of-the-art accuracy, and exceptional robustness, attaining very high accuracy on VGG-16/ImageNette demonstrating superior architectural robustness compared to contemporary optimizers. The results highlight that NOVAKs architectural contributions (particularly rectification, decoupled decay, and hybrid momentum) are crucial for reliable training of deep plain networks lacking skip connections, addressing a long-standing limitation of existing adaptive optimization methods.

Related papers

HQP: Sensitivity-Aware Hybrid Quantization and Pruning for Ultra-Low-Latency Edge AI Inference [0.0]
Hybrid Quantization and Pruning (HQP) framework designed to achieve synergistic model acceleration.<n>HQP framework achieves a peak performance gain of 3.12 times inference speedup and a 55 percent model size reduction.
arXiv Detail & Related papers (2026-02-02T18:17:45Z)
ROOT: Robust Orthogonalized Optimizer for Neural Network Training [47.05662448082334]
Large language models (LLMs) remain a critical challenge, particularly as model scaling exacerbates sensitivity to imprecision and training instability.<n>We develop a dimension-robustization scheme that enhances robustness through iterations tailored to specific matrix sizes.<n>Second, we introduce an optimization-robustization framework that suppresses outliers noise while preserving meaningful directions.
arXiv Detail & Related papers (2025-11-25T18:48:05Z)
Slice-Wise Initial State Optimization to Improve Cost and Accuracy of the VQE on Lattice Models [0.0]
We propose an optimization method for the Variational Quantum Eigensolver (VQE) that combines adaptive and physics-inspired ansatz design.<n>This quasi-dynamical approach preserves expressivity and hardware efficiency while avoiding the overhead of operator selection.<n> Benchmarks on one- and two-dimensional Heisenberg and Hubbard models with up to 20 qubits show improved fidelities, reduced function evaluations, or both, compared to fixed-layer VQE.
arXiv Detail & Related papers (2025-09-16T12:52:23Z)
FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off [12.900580256269155]
We propose Fast Multi-Attention Dynamic Convolution (FMDConv), which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off.<n>Experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that FMDConv reduces the computational cost by up to 49.8% on ResNet-18 and 42.2% on ResNet-50.
arXiv Detail & Related papers (2025-03-21T20:23:32Z)
Regularized Adaptive Momentum Dual Averaging with an Efficient Inexact Subproblem Solver for Training Structured Neural Network [9.48424754175943]
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) for training structured neural networks.<n>We show that RAMDA attains the ideal structure induced by the regularizer at the stationary point of convergence.<n>This structure is locally optimal near the point of convergence, so RAMDA is guaranteed to obtain the best structure possible.
arXiv Detail & Related papers (2024-03-21T13:43:49Z)
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework. We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z)
Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning [64.05646120624287]
We derive the expression of the joint Q value function of LVD and MVD. To ensure optimal consistency, the optimal node is required to be the unique STN. Our method outperforms state-of-the-art baselines in experiments on various benchmarks.
arXiv Detail & Related papers (2022-11-22T08:14:50Z)
Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent. We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings. We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z)
Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients. Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting [60.97465664419395]
We develop a significant and surprising extension of the splitting descent framework that addresses the local optimality issue. By simply allowing both positive and negative weights during splitting, we can eliminate the appearance of splitting stability in S2D. We verify our method on various challenging benchmarks such as CIFAR-100, ImageNet and ModelNet40, on which we outperform S2D and other advanced methods on learning accurate and energy-efficient neural networks.
arXiv Detail & Related papers (2020-03-23T17:09:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.