Related papers: ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning

ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning

URL: http://arxiv.org/abs/2508.02719v1
Date: Fri, 01 Aug 2025 02:53:29 GMT
Title: ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning
Authors: Samiksha BC,
Abstract summary: ZetA is a novel deep learning system that extends Adam by incorporating dynamic scaling based on the zeta function.<n>We show that ZetA is a computationally efficient and robust alternative to Adam in noisy or high-granularity classification tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work introduces ZetA, a novel deep learning optimizer that extends Adam by incorporating dynamic scaling based on the Riemann zeta function. To the best of our knowledge, ZetA is the first optimizer to apply zeta-based gradient scaling within deep learning optimization. The method improves generalization and robustness through a hybrid update mechanism that integrates adaptive damping, cosine similarity-based momentum boosting, entropy-regularized loss, and Sharpness-Aware Minimization (SAM)-style perturbations. Empirical evaluations on SVHN, CIFAR10, CIFAR100, STL10, and noisy CIFAR10 consistently show test accuracy improvements over Adam. All experiments employ a lightweight fully connected network trained for five epochs under mixed-precision settings. The results demonstrate that ZetA is a computationally efficient and robust alternative to Adam, particularly effective in noisy or high-granularity classification tasks.

Related papers

Artificial Liver Classifier: A New Alternative to Conventional Machine Learning Models [4.395397502990339]
This paper introduces the Artificial Liver (ALC), a novel supervised learning classifier inspired by the human liver's detoxification function.<n>The ALC is characterized by its simplicity, speed, hyperparameters-free, ability to reduce overfitting, and effectiveness in addressing multi-classification problems.<n>It was evaluated on five benchmark machine learning datasets: Iris Flower, Breast Cancer Wisconsin, Wine, Voice Gender, and MNIST.
arXiv Detail & Related papers (2025-01-14T12:42:01Z)
EXAdam: The Power of Adaptive Cross-Moments [0.0]
This paper introduces EXAdam, a novel optimization algorithm that builds upon the widely-used Adam algorithm.<n> EXAdam incorporates two key enhancements: (1) new debiasing terms for improved moment estimation and (2) a gradient-based acceleration mechanism.<n> Empirical evaluations demonstrate EXAdam's superiority over Adam, achieving 38.46% faster convergence and yielding improvements of 1.96%, 2.17%, and 1.17% in training, validation, and testing accuracies.
arXiv Detail & Related papers (2024-12-29T00:11:54Z)
Towards Simple and Provable Parameter-Free Adaptive Gradient Methods [56.060918447252625]
We present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees.<n>We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates.
arXiv Detail & Related papers (2024-12-27T04:22:02Z)
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function [0.0]
We introduce sigSignGrad and tanhSignGrad, two novel gradients that integrate adaptive friction coefficients. Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S. Experiments on CIFAR-10, Mini-Image-Net using ResNet50 and ViT architectures confirm the superior performance our proposeds.
arXiv Detail & Related papers (2024-08-07T03:20:46Z)
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling [27.058009599819012]
We study the connection between optimal learning rates and batch sizes for Adam styles. We prove that the optimal learning rate first rises and then falls as the batch size increases.
arXiv Detail & Related papers (2024-05-23T13:52:36Z)
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent [73.1383658672682]
We introduce Meta-Adaptives (MADA), a unified framework that can generalize several known convergences and dynamically learn the most suitable one during training. We empirically compare MADA to other populars on vision and language tasks, and find that MADA consistently outperforms Adam and other populars. We also propose AVGrad, a modification of AMS that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization.
arXiv Detail & Related papers (2024-01-17T00:16:46Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
Read the Signs: Towards Invariance to Gradient Descent's Hyperparameter Initialization [3.1153758106426603]
We propose ActiveLR, an optimization meta algorithm that localizes the learning rate, $alpha$, and adapts them at each epoch according to whether the gradient at each epoch changes sign or not. We implement the Active version (ours) of widely used and recently published gradient descents, namely SGD with momentum, AdamW, RAdam, and AdaBelief.
arXiv Detail & Related papers (2023-01-24T16:57:00Z)
Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation) We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z)
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients [112.00379151834242]
We propose adaptive learning rate principle, in which the running mean of squared gradient in Adam is replaced by a weighted mean, with weights chosen to maximize the estimated variance each coordinate. This results in faster adaptation, which leads more desirable empirical convergence behaviors.
arXiv Detail & Related papers (2020-06-21T21:47:43Z)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.