Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum
- URL: http://arxiv.org/abs/2508.04293v1
- Date: Wed, 06 Aug 2025 10:30:22 GMT
- Title: Comparative Analysis of Novel NIRMAL Optimizer Against Adam and SGD with Momentum
- Authors: Nirmal Gaud, Surej Mouli, Preeti Katiyar, Vaduguru Venkata Ramya,
- Abstract summary: NIRMAL (Novel Integrated Robust Multi-Adaptation Learning) is a novel optimization algorithm that combines multiple strategies inspired by the movements of the chess piece.<n>NIRMAL achieves competitive performance, particularly on the more challenging CIFAR-100 dataset.<n>These findings underscore NIRMAL's significant ability as a versatile and effective datasets for various deep learning tasks.
- Score: 0.8437187555622164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study proposes NIRMAL (Novel Integrated Robust Multi-Adaptation Learning), a novel optimization algorithm that combines multiple strategies inspired by the movements of the chess piece. These strategies include gradient descent, momentum, stochastic perturbations, adaptive learning rates, and non-linear transformations. We carefully evaluated NIRMAL against two widely used and successful optimizers, Adam and SGD with Momentum, on four benchmark image classification datasets: MNIST, FashionMNIST, CIFAR-10, and CIFAR-100. The custom convolutional neural network (CNN) architecture is applied on each dataset. The experimental results show that NIRMAL achieves competitive performance, particularly on the more challenging CIFAR-100 dataset, where it achieved a test accuracy of 45.32\%and a weighted F1-score of 0.4328. This performance surpasses Adam (41.79\% accuracy, 0.3964 F1-score) and closely matches SGD with Momentum (46.97\% accuracy, 0.4531 F1-score). Also, NIRMAL exhibits robust convergence and strong generalization capabilities, especially on complex datasets, as evidenced by stable training results in loss and accuracy curves. These findings underscore NIRMAL's significant ability as a versatile and effective optimizer for various deep learning tasks.
Related papers
- ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning [0.0]
ZetA is a novel deep learning system that extends Adam by incorporating dynamic scaling based on the zeta function.<n>We show that ZetA is a computationally efficient and robust alternative to Adam in noisy or high-granularity classification tasks.
arXiv Detail & Related papers (2025-08-01T02:53:29Z) - Taming LLMs by Scaling Learning Rates with Gradient Grouping [49.91587150497186]
Training large language models (LLMs) poses challenges due to their massive scale and heterogeneous architectures.<n>This work introduces Scaling with Gradient Grouping (SGG), an gradient wrapper that improves adaptive learning rate estimation by dynamic grouping and group-specific scaling.
arXiv Detail & Related papers (2025-06-01T15:30:37Z) - Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning [50.868594148443215]
In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance.
We propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks.
arXiv Detail & Related papers (2024-08-08T01:31:38Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers [41.56951365163419]
"MixedNUTS" is a training-free method where the output logits of a robust classifier are processed by nonlinear transformations with only three parameters.
MixedNUTS then converts the transformed logits into probabilities and mixes them as the overall output.
On CIFAR-10, CIFAR-100, and ImageNet datasets, experimental results with custom strong adaptive attacks demonstrate MixedNUTS's vastly improved accuracy and near-SOTA robustness.
arXiv Detail & Related papers (2024-02-03T21:12:36Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Precision-Weighted Federated Learning [1.8160945635344528]
We propose a novel algorithm that takes into account the variance of the gradients when computing the weighted average of the parameters of models trained in a Federated Learning setting.
Our method was evaluated using standard image classification datasets with two different data partitioning strategies (IID/non-IID) to measure the performance and speed of our method in resource-constrained environments.
arXiv Detail & Related papers (2021-07-20T17:17:10Z) - ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates.
We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z) - SASL: Saliency-Adaptive Sparsity Learning for Neural Network
Acceleration [20.92912642901645]
We propose a Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization.
Our method can reduce 49.7% FLOPs of ResNet-50 with very negligible 0.39% top-1 and 0.05% top-5 accuracy degradation.
arXiv Detail & Related papers (2020-03-12T16:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.