ADAM Optimization with Adaptive Batch Selection
- URL: http://arxiv.org/abs/2512.06795v1
- Date: Sun, 07 Dec 2025 11:15:14 GMT
- Title: ADAM Optimization with Adaptive Batch Selection
- Authors: Gyu Yeol Kim, Min-hwan Oh,
- Abstract summary: We introduce Adam with Combinatorial Bandit Sampling (AdamCB), which integrates bandit techniques into Adam.<n>Our regret analysis shows that AdamCB achieves faster convergence than Adam-based methods including the previous bandit-based variant.
- Score: 35.61554920994471
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the sampling distribution using a bandit framework to select samples adaptively. While promising, the bandit-based variant of Adam suffers from limited theoretical guarantees. In this paper, we introduce Adam with Combinatorial Bandit Sampling (AdamCB), which integrates combinatorial bandit techniques into Adam to resolve these issues. AdamCB is able to fully utilize feedback from multiple samples at once, enhancing both theoretical guarantees and practical performance. Our regret analysis shows that AdamCB achieves faster convergence than Adam-based methods including the previous bandit-based variant. Numerical experiments demonstrate that AdamCB consistently outperforms existing methods.
Related papers
- Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks [38.11287525994738]
We present the first theoretical characterization of how affects Adam's generalization.<n>Our results reveal that while both Adam and AdamW with proper weight decay converge to poor test error solutions, their mini-batch variants can achieve near-zero test error.
arXiv Detail & Related papers (2025-10-13T12:48:22Z) - In Search of Adam's Secret Sauce [11.215133680044005]
We train over 1,300 language models across different data configurations and scales.<n>We find that signed momentum methods are faster than SGD, but consistently underperform relative to Adam.<n>We show that Adam in this setting implements a natural online algorithm for estimating the mean and variance of gradients.
arXiv Detail & Related papers (2025-05-27T23:30:18Z) - Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization [90.08459757321405]
Federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead.<n>We propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates local model parameters and moment estimates.<n>By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error.
arXiv Detail & Related papers (2024-05-28T07:56:49Z) - Making Substitute Models More Bayesian Can Enhance Transferability of
Adversarial Examples [89.85593878754571]
transferability of adversarial examples across deep neural networks is the crux of many black-box attacks.
We advocate to attack a Bayesian model for achieving desirable transferability.
Our method outperforms recent state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-02-10T07:08:13Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Towards Practical Adam: Non-Convexity, Convergence Theory, and
Mini-Batch Acceleration [12.744658958445024]
Adam is one of the most influential adaptive algorithms for training deep neural networks.
Existing approaches, such as decreasing an adaptive learning rate, adopting a big batch size, have tried to promote Adam-type algorithms to converge.
We introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of historical base learning rate.
arXiv Detail & Related papers (2021-01-14T06:42:29Z) - Adam$^+$: A Stochastic Method with Adaptive Variance Reduction [56.051001950733315]
Adam is a widely used optimization method for deep learning applications.
We propose a new method named Adam$+$ (pronounced as Adam-plus)
Our empirical studies on various deep learning tasks, including image classification, language modeling, and automatic speech recognition, demonstrate that Adam$+$ significantly outperforms Adam.
arXiv Detail & Related papers (2020-11-24T09:28:53Z) - Adam with Bandit Sampling for Deep Learning [18.033149110113378]
We propose a generalization of Adam, called Adambs, that allows us to adapt to different training examples.
Experiments on various models and datasets demonstrate Adambs's fast convergence in practice.
arXiv Detail & Related papers (2020-10-24T21:01:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.